diff --git a/.gitignore b/.gitignore index 09e937c..48c4b97 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,5 @@ .claude/settings.local.json .claude/agent-memory/ -.claude/scheduled_tasks.lock # Python __pycache__/ diff --git a/AGENTS.md b/AGENTS.md index c64af40..9e7350d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -65,7 +65,7 @@ See [[agent-change-process]] for the full methodology. ./pulumi/ # Pulumi IaC (tailnet ACLs, dns, cloud) ~/.config/{nvim,fish} # user's shell config, managed by chezmoi ~/code/personal/ # user's projects -~/code/personal/zk # user's zettelkasten (Obsidian-sync). Reference-data source; migrating into heph docs (hephaestus). +~/code/personal/zk # user's Obsidian-sync managed zettelkasten. Potential source for reference data. ~/code/3rd/ # mirrored external projects ~/code/work # FORBIDDEN ``` @@ -147,16 +147,10 @@ Create a new spork: `mise run spork-create ` ## Task Discovery -BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`), -the user's self-hosted context/task system. Fetch them with the CLI: - ```fish -heph list --project Blumeops --json # outstanding Blumeops tasks as JSON +mise run blumeops-tasks # fetch from Todoist, sorted by priority ``` - -(This replaced the retired `blumeops-tasks` mise task, which read from Todoist.) - -Most operational scripts are stored in `./mise-tasks/`. For scripts with any logic or +Most tasks are stored in `./mise-tasks/`. For scripts with any logic or complexity, use uv run --script 's with explicit dependencies. Complex workflows with artifacts should become dagger pipelines. Mise tasks are for development processes and operations - tools for the user or the agent. diff --git a/CHANGELOG.md b/CHANGELOG.md index 0499154..7ae5f8e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,259 +12,6 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). -## [v1.17.0] - 2026-06-03 - -### Features - -- Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle - picker, and prize assignment console — on ringtail k3s with `shower.eblu.me` - as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App - source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). -- Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the - boolean lock with a four-phase `ShowerState` (`pre_event` → `party` → - `prizes_locked` → `event_locked`), adds an append-only "guest memories" - panel where guests can leave photos and comments for the baby, and - polishes the admin and QR views. Three Django migrations - (`0009_shower_phase`, `0010_guest_memories`, `0011_book_description`) - run automatically in the entrypoint against the SQLite PV. No config - or env-var changes. - - Container build also gains a Forgejo-PyPI workaround: Forgejo's simple - index returns absolute file URLs hardcoded to the public ROOT_URL - (`forge.eblu.me`), which the Fly edge 403s on `/api/packages/*`. The - wheel and sdist are now both pulled via direct `fetchurl` against - `forge.ops.eblu.me` (tailnet-only) and the wheel is handed to pip as - a local path. -- `review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings. -- runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos. - -### Bug Fixes - -- Fix nightly borgmatic backups failing for 2 days. The shower SQLite - dump hook referenced `kubectl --context=k3s-ringtail`, but indri's - kubeconfig deliberately doesn't carry the ringtail credentials. The - `before_backup` hook's failure aborted the entire run, taking out - *both* the local sifaka repo and the BorgBase offsite. Replaced - the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump` - helper deployed by the ansible role. Each dump entry now declares a - `target` of either `local:` (mealie — kubectl uses indri's - kubeconfig) or `ssh:` (shower — ssh into ringtail and - run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml - on ringtail is mode 644 so no sudo required). Bytes stream back via - `kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl - cp` requires `tar` inside the pod and nix-built images like shower - don't bundle it. -- Shower app container now bakes the wheel + Python deps into the image - at build time via `buildPythonPackage` instead of pip-installing on - first boot. Boots are deterministic and don't depend on forge PyPI - being reachable from the pod. The `wheelHash` in - `containers/shower/default.nix` is the sha256 sourced from the - [forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/); - bumping the version means bumping that hash too. - - Borgmatic now covers the shower app: SQLite is dumped from the live - pod via `kubectl exec` (mirroring the existing mealie entry, with - `context: k3s-ringtail`), and the prize-photo media share is picked up - through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as - `/Volumes/photos`). -- Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it. -- Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests. -- Fixed homepage container EACCES on cold start: the nix-built image now chowns - `/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the - behavior of the old Dockerfile. Without this, homepage couldn't seed missing - skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on - its first uncached request. Caught during the ringtail cutover. -- Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings. - - Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section. -- Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`. - -### Infrastructure - -- Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated - paperless + teslamate databases) into backups and Grafana. Adds a Tailscale - LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4 - route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` + - `paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the - Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that - opened at cutover (the migrated live data was still being backed up from the - now-frozen minikube copies) and unblocks the wave-1 decommission. -- Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64). - The container is now built via nix (`containers/homepage/default.nix`), adapted - from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and - wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on - minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, - Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries - in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama) - auto-populate via Ingress annotations. -- Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down. -- Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]]. -- Move the entire Immich stack — server, machine-learning, valkey, - and the PostgreSQL+VectorChord cluster — off `minikube-indri` and - onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG - `pg_basebackup` (replica catch-up then promote); row counts on - `asset`, `user`, `album`, `smart_search`, `activity`, `asset_face` - verified equal between source and replica before cutover. The ML - pod now uses ringtail's RTX 4080 via the nvidia-device-plugin - (time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy - routing at `photos.ops.eblu.me` is unchanged (still - `photos.tail8d86e.ts.net`, the device just lives on ringtail now). - Borgmatic backups continue against the same `immich-pg` tailnet - hostname. First concrete chain in the broader indri-k8s - decommission effort. -- Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration. -- Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`. -- Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking. -- Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: ` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed. -- Wire shower app for public exposure: fly nginx `shower.eblu.me` server - block as a guest-only surface — splash page, `/prizes//`, static - assets, media. Everything authenticated (`/admin/`, `/host/`, - `/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit - `shower.ops.eblu.me` for the operator console + admin; the app's - v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on - the tailnet point back at the WAN host for guests. Plus a Caddy route - on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking - request rate, error rate, latency, bandwidth, and access logs. -- Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate. -- Add nix-built amd64 valkey for ringtail (`containers/valkey/default.nix`) so immich-ringtail can stop pulling the upstream multi-arch `docker.io/valkey/valkey` image. Existing `container.py` continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7). -- Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments - (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on - indri). Pulls in stable database observability (v1.15) and the OTel Collector - v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger - `container.py` per the build-container-image migration playbook. -- Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper. -- Decommission the wave-1 services on minikube-indri now that paperless, - teslamate, and mealie run on ringtail with their data backed up. Removes the - minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app - definitions (pruning the parked Deployments, Services, and the redundant - minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles - from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate` - databases are dropped from indri's blumeops-pg as the finalization step. - miniflux + authentik remain on the minikube cluster (later waves). -- Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation. -- Move paperless, teslamate, and mealie off `minikube-indri` onto - `k3s-ringtail`, shedding ~1.1 GiB of resident load from the - OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been - killing `kube-apiserver`/`dockerd`/argocd, flapping every - minikube-hosted service at once). paperless + teslamate databases - move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold - `pg_dump`/`pg_restore` from the quiesced source — row counts verified - equal before any routing flip; source DBs dropped only after the - ringtail side serves traffic. mealie's SQLite PVC is copied as-is. - paperless media stays on sifaka NFS. Downtime-tolerant cold cutover - (no streaming replication); rollback is repoint-and-scale-up with the - source untouched. Second chain in the indri-k8s decommission after - [[migrate-immich-to-ringtail]]. -- Recurring maintenance batch: - - - Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`). - - Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks. -- Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention. -- Reviewed `mealie` service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred. -- Deploy shower v1.1.2 — bump container build to new app release. -- Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set `interval = 0` in `up.conf` to restore on-demand polling. -- Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks. -- Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages` - on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton - Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game - compatibility option to work around it. -- Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes - Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat` - lockfiles before each launch. SN2 pops up an invisible (0×0-sized) - Error dialog when it detects an unclean exit, blocking GameThread - forever; this is observable only as a black screen with a spinning - loader. Use via Steam launch option: `sn2-prelaunch %command%`. -- Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`. -- Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events. -- Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403` → `forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`. -- Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main. -- Bump shower container to v1.1.1 (probe FOD hash). -- Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail). -- Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop. -- Deploy shower v1.1.1 to ringtail (kustomize newTag bump). -- Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload). -- Fix three follow-ups from the wave-1 decommission: grant the local - break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin` — - previously only the Authentik `admins` group had access, so admin was - locked out whenever its token expired), and repoint the alloy blackbox - probe for teslamate from the deleted minikube service to - `https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned - paperless/teslamate roles + ExternalSecrets left on the minikube - blumeops-pg are also cleaned up. -- Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert. -- Pin shower v1.1.1 FOD outputHash (probed locally on ringtail). -- Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes. -- Rebuild and retag alloy v1.16.0 container images from the main-branch SHA - following the squash-merge of #345, per the build-container-image - squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`) - now reference `9564435` rather than the branch SHA `26a3ab5`, restoring - source traceability after branch cleanup. -- Rebuild shower from the post-merge commit on main so the container's - SHA tag points at a commit that will still exist after the 30-day - branch-cleanup window. Functionally identical to the branch-tag image - already deployed, just preserves source traceability per - [[build-container-image#Squash-merge and container tags]]. -- Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA). -- Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`. -- Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs. -- Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled` → `tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~5–10s) but actually completes. -- Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down. -- Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly. -- Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup. -- Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump - `nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest. - `nixpkgs-services` was intentionally left pinned (skipped by the - `flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]]. -- Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go - 1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale - MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned, - and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started - in #345 — all four Alloy services (alloy-k8s, alloy-ringtail, - alloy-tracing-ringtail, alloy ansible) now run v1.16.0. -- Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits). - -### Documentation - -- Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter. -- Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs. -- New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync. -- Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it. -- Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page. -- Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment. -- Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport--.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`. -- Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`. -- Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests. -- Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`. -- Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card. -- Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`. -- Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar. -- rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround. - -### AI Assistance - -- Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path. -- CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally. - -### Miscellaneous - -- Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz`→`docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone. -- Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the - kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so - the branch commit `444ff91` baked into the prior tag isn't reachable - from main's history. The new tag points at a commit that exists on - main; image content is byte-identical because the FOD output is content - addressed and the inputs didn't change. -- Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes. -- Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and - TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via - `gethomepage.dev/*` annotations; once these services migrated to ringtail they - were discovered automatically, making their leftover static `services.yaml` - entries (needed only while they lived on minikube) redundant. -- Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name. -- `container-build-and-release` now prints the specific `mise run runner-logs ` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered. -- `mise run runner-logs -j ` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code. - - ## [v1.16.0] - 2026-04-18 ### Infrastructure diff --git a/ansible/playbooks/indri.yml b/ansible/playbooks/indri.yml index 1e33bb1..ddb57f8 100644 --- a/ansible/playbooks/indri.yml +++ b/ansible/playbooks/indri.yml @@ -260,7 +260,5 @@ tags: cv - role: docs tags: docs - - role: heph - tags: heph - role: caddy tags: caddy diff --git a/ansible/playbooks/ringtail.yml b/ansible/playbooks/ringtail.yml index b05d67a..ee5604b 100644 --- a/ansible/playbooks/ringtail.yml +++ b/ansible/playbooks/ringtail.yml @@ -57,7 +57,7 @@ tasks: - name: Ensure blumeops repo is present ansible.builtin.git: - repo: "https://forge.ops.eblu.me/eblume/blumeops.git" + repo: "https://forge.eblu.me/eblume/blumeops.git" dest: /etc/blumeops version: "{{ ringtail_commit | default('main') }}" force: true diff --git a/ansible/roles/borgmatic/defaults/main.yml b/ansible/roles/borgmatic/defaults/main.yml index a743161..123cb0f 100644 --- a/ansible/roles/borgmatic/defaults/main.yml +++ b/ansible/roles/borgmatic/defaults/main.yml @@ -56,17 +56,12 @@ borgmatic_k8s_sqlite_dumps: namespace: mealie label_selector: app=mealie db_path: /app/data/mealie.db - # migrated to ringtail (wave-1); ssh to ringtail and run k3s kubectl - # there, same as shower below. - target: ssh:eblume@ringtail + context: minikube - name: shower namespace: shower label_selector: app=shower db_path: /app/data/db.sqlite3 - # ssh to ringtail and run k3s kubectl there — avoids needing a - # ringtail kubeconfig on indri. k3s.yaml on ringtail is - # world-readable (mode 644), so no sudo required. - target: ssh:eblume@ringtail + context: k3s-ringtail # Exclude patterns borgmatic_exclude_patterns: [] @@ -103,18 +98,17 @@ borgmatic_postgresql_databases: hostname: pg.ops.eblu.me port: 5432 username: borgmatic + - name: teslamate + hostname: pg.ops.eblu.me + port: 5432 + username: borgmatic - name: authentik hostname: pg.ops.eblu.me port: 5432 username: borgmatic - # migrated to ringtail blumeops-pg (wave-1); port 5434 = Caddy L4 route - - name: teslamate - hostname: pg.ops.eblu.me - port: 5434 - username: borgmatic - name: paperless hostname: pg.ops.eblu.me - port: 5434 + port: 5432 username: borgmatic # immich-pg cluster (VectorChord) via Caddy L4 on port 5433 - name: immich diff --git a/ansible/roles/borgmatic/tasks/main.yml b/ansible/roles/borgmatic/tasks/main.yml index 36d3bb6..eacefa5 100644 --- a/ansible/roles/borgmatic/tasks/main.yml +++ b/ansible/roles/borgmatic/tasks/main.yml @@ -19,10 +19,8 @@ ansible.builtin.copy: content: | # Managed by ansible (borgmatic role) - k8s PostgreSQL backup credentials - # 5432 = minikube blumeops-pg, 5433 = immich-pg, 5434 = ringtail blumeops-pg pg.ops.eblu.me:5432:*:borgmatic:{{ borgmatic_db_password }} pg.ops.eblu.me:5433:*:borgmatic:{{ borgmatic_db_password }} - pg.ops.eblu.me:5434:*:borgmatic:{{ borgmatic_db_password }} dest: ~/.pgpass mode: '0600' no_log: true @@ -51,20 +49,6 @@ mode: '0700' when: borgmatic_k8s_sqlite_dumps | length > 0 -- name: Ensure ~/bin exists - ansible.builtin.file: - path: "{{ ansible_env.HOME }}/bin" - state: directory - mode: '0755' - when: borgmatic_k8s_sqlite_dumps | length > 0 - -- name: Deploy k8s SQLite dump helper script - ansible.builtin.template: - src: k8s-sqlite-dump.sh.j2 - dest: "{{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump" - mode: '0755' - when: borgmatic_k8s_sqlite_dumps | length > 0 - - name: Deploy borgmatic configuration ansible.builtin.template: src: config.yaml.j2 diff --git a/ansible/roles/borgmatic/templates/config.yaml.j2 b/ansible/roles/borgmatic/templates/config.yaml.j2 index 0893dbc..85804b7 100644 --- a/ansible/roles/borgmatic/templates/config.yaml.j2 +++ b/ansible/roles/borgmatic/templates/config.yaml.j2 @@ -32,20 +32,12 @@ exclude_patterns: encryption_passcommand: {{ borgmatic_encryption_passcommand }} {% if borgmatic_k8s_sqlite_dumps %} -# Pre-backup: dump SQLite databases from k8s pods. -# Uses sqlite3.backup() for a safe, consistent copy. -# -# Quoting/escaping is delegated to ~/bin/borgmatic-k8s-sqlite-dump -# (deployed by the borgmatic ansible role). Each entry's `target` -# is either: -# - local: -> local kubectl with --context (mealie etc.) -# - ssh: -> ssh + k3s kubectl on the cluster host, -# used for ringtail since indri's kubeconfig -# deliberately doesn't carry that context. +# Pre-backup: dump SQLite databases from k8s pods +# Uses sqlite3 .backup for a safe, consistent copy (no corruption from concurrent writes) before_backup: - mkdir -p {{ borgmatic_k8s_dump_dir }} {% for db in borgmatic_k8s_sqlite_dumps %} - - {{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump {{ db.target }} {{ db.namespace }} {{ db.label_selector }} {{ db.db_path }} {{ db.name }} {{ borgmatic_k8s_dump_dir }}/{{ db.name }}.db + - /opt/homebrew/bin/kubectl --context={{ db.context }} exec -n {{ db.namespace }} deploy/{{ db.name }} -- python3 -c "import sqlite3; sqlite3.connect('{{ db.db_path }}').backup(sqlite3.connect('/tmp/{{ db.name }}-backup.db'))" && /opt/homebrew/bin/kubectl --context={{ db.context }} cp {{ db.namespace }}/$(/opt/homebrew/bin/kubectl --context={{ db.context }} get pod -n {{ db.namespace }} -l {{ db.label_selector }} -o jsonpath='{.items[0].metadata.name}'):/tmp/{{ db.name }}-backup.db {{ borgmatic_k8s_dump_dir }}/{{ db.name }}.db {% endfor %} {% endif %} diff --git a/ansible/roles/borgmatic/templates/k8s-sqlite-dump.sh.j2 b/ansible/roles/borgmatic/templates/k8s-sqlite-dump.sh.j2 deleted file mode 100644 index 9cc24da..0000000 --- a/ansible/roles/borgmatic/templates/k8s-sqlite-dump.sh.j2 +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/env bash -# {{ ansible_managed }} -# -# Helper script invoked by borgmatic's before_backup hook to capture a -# k8s pod's SQLite database. Keeps the borgmatic config readable by -# pulling all the quoting out of YAML. -# -# Usage: -# borgmatic-k8s-sqlite-dump \ -# -# -# is one of: -# local: - run local kubectl with --context= -# ssh: - ssh to host and run k3s kubectl there -# (no indri-side kubeconfig needed) -# -# - k8s namespace of the pod -# - label selector to find the pod (e.g. app=shower) -# - absolute path inside the pod to the SQLite DB -# - short name used for temp filenames -# - file on this host to receive the dump -set -euo pipefail - -target=${1:?missing target} -namespace=${2:?missing namespace} -selector=${3:?missing selector} -db_path=${4:?missing db path} -name=${5:?missing name} -dump_target=${6:?missing dump target} - -# Stage the backup next to the source DB (a guaranteed-writable volume); -# minimal nix images (e.g. mealie) have no /tmp. -pod_tmp="$(dirname "$db_path")/.borgmatic-backup-${name}.db" - -python_backup='import sqlite3; sqlite3.connect("'"$db_path"'").backup(sqlite3.connect("'"$pod_tmp"'"))' - -mode=${target%%:*} -ref=${target#*:} - -case "$mode" in - local) - # Pulls dump bytes out via "kubectl exec -- cat" rather than - # "kubectl cp", which would otherwise need tar inside the pod - # (nix-built images like shower don't bundle tar). - context=$ref - kubectl="/opt/homebrew/bin/kubectl --context=$context -n $namespace" - pod=$($kubectl get pod -l "$selector" \ - -o jsonpath='{.items[0].metadata.name}') - $kubectl exec "$pod" -- python3 -c "$python_backup" - $kubectl exec "$pod" -- cat "$pod_tmp" > "$dump_target" - $kubectl exec "$pod" -- rm -f "$pod_tmp" - ;; - ssh) - host=$ref - # Force bash on the remote (user's login shell on ringtail is - # fish). Pipe the script via stdin to dodge nested quoting. - # The dump bytes come back over the ssh stdout stream — no - # intermediate scp, no tar requirement in the pod. - ssh "$host" bash < "$dump_target" -set -euo pipefail -export KUBECONFIG=/etc/rancher/k3s/k3s.yaml -pod=\$(k3s kubectl -n "$namespace" get pod -l "$selector" -o jsonpath='{.items[0].metadata.name}') -k3s kubectl -n "$namespace" exec "\$pod" -- python3 -c '$python_backup' 1>&2 -k3s kubectl -n "$namespace" exec "\$pod" -- cat "$pod_tmp" -k3s kubectl -n "$namespace" exec "\$pod" -- rm -f "$pod_tmp" 1>&2 -EOF - ;; - *) - echo "borgmatic-k8s-sqlite-dump: unknown target mode: $mode" >&2 - echo " expected local: or ssh:" >&2 - exit 1 - ;; -esac diff --git a/ansible/roles/caddy/defaults/main.yml b/ansible/roles/caddy/defaults/main.yml index e6d7385..da6f3f9 100644 --- a/ansible/roles/caddy/defaults/main.yml +++ b/ansible/roles/caddy/defaults/main.yml @@ -52,9 +52,6 @@ caddy_services: - name: devpi host: "pypi.{{ caddy_domain }}" backend: "http://localhost:3141" - - name: heph - host: "heph.{{ caddy_domain }}" - backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell - name: kiwix host: "kiwix.{{ caddy_domain }}" backend: "https://kiwix.tail8d86e.ts.net" @@ -120,8 +117,6 @@ caddy_tcp_services: backend: "pg.tail8d86e.ts.net:5432" # PostgreSQL (blumeops-pg) - port: 5433 backend: "immich-pg.tail8d86e.ts.net:5432" # PostgreSQL (immich-pg) - - port: 5434 - backend: "blumeops-pg-ringtail.tail8d86e.ts.net:5432" # PostgreSQL (blumeops-pg on ringtail) - port: "{{ sifaka_node_exporter_port }}" backend: "sifaka:{{ sifaka_node_exporter_port }}" # Sifaka node_exporter - port: "{{ sifaka_smartctl_exporter_port }}" diff --git a/ansible/roles/docs/defaults/main.yml b/ansible/roles/docs/defaults/main.yml index a5a1a8a..f09221b 100644 --- a/ansible/roles/docs/defaults/main.yml +++ b/ansible/roles/docs/defaults/main.yml @@ -3,8 +3,9 @@ # Caddy serves docs_content_dir directly via the static-kind service block, # with Quartz-style try_files (path → path/ → path.html → 404). -docs_version: "v1.17.0" +docs_version: "v1.16.0" docs_release_url: "https://forge.eblu.me/eblume/blumeops/releases/download/{{ docs_version }}/docs-{{ docs_version }}.tar.gz" + docs_home: /Users/erichblume/blumeops/docs docs_content_dir: "{{ docs_home }}/content" docs_version_sentinel: "{{ docs_home }}/.installed-version" diff --git a/ansible/roles/heph/defaults/main.yml b/ansible/roles/heph/defaults/main.yml deleted file mode 100644 index 88d2240..0000000 --- a/ansible/roles/heph/defaults/main.yml +++ /dev/null @@ -1,49 +0,0 @@ ---- -# hephaestus hub — the canonical heph replica (server mode) on indri. -# Other devices (e.g. gilbert) are spokes that sync against this hub. -# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo. - -# Pinned release used for the initial `cargo install` and the PWA shell. -# After bootstrap, hephd's own --self-update keeps the binary current; this -# pin only governs the first install and the bundled PWA shell version. -heph_version: v1.2.1 - -# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial -# install and unattended self-update build from the same source (no ssh-agent). -heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git - -heph_bin_dir: /Users/erichblume/.cargo/bin -heph_binary: "{{ heph_bin_dir }}/hephd" - -# rustc/cargo here are rustup shims. The bare (non-mise) environment that the -# launchagent and ansible run in falls back to rustup's *default* toolchain, -# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the -# channel explicitly so both the bootstrap build and unattended self-update -# always use a current toolchain regardless of the host's rustup default. -heph_rust_toolchain: stable - -heph_data_dir: /Users/erichblume/.local/share/heph -heph_db: "{{ heph_data_dir }}/heph.db" -heph_socket: "{{ heph_data_dir }}/hephd.sock" -heph_log_dir: /Users/erichblume/Library/Logs - -# Version-pinned source checkout; the PWA static shell is served directly from -# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version. -heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src -heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa" - -# Hub listens on all interfaces so tailnet spokes can reach it directly -# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me. -# Access is gated by Authentik OIDC regardless — tailnet reachability is not -# enough (this is the owner's most sensitive data). -heph_http_addr: 0.0.0.0:8787 -heph_port: 8787 -heph_external_url: https://heph.ops.eblu.me - -# Authentik OIDC — issuer + audience together turn hub auth on. The audience is -# the device-code client id (see argocd/manifests/authentik heph blueprint). -heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/ -heph_oidc_audience: heph - -# Self-update poll interval (seconds). 10 minutes. -heph_self_update_interval_secs: 600 diff --git a/ansible/roles/heph/handlers/main.yml b/ansible/roles/heph/handlers/main.yml deleted file mode 100644 index 92fe9d7..0000000 --- a/ansible/roles/heph/handlers/main.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -- name: Restart heph - ansible.builtin.shell: | - launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true - launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist - changed_when: true diff --git a/ansible/roles/heph/tasks/main.yml b/ansible/roles/heph/tasks/main.yml deleted file mode 100644 index 7a45fe3..0000000 --- a/ansible/roles/heph/tasks/main.yml +++ /dev/null @@ -1,82 +0,0 @@ ---- -# hephaestus hub (server mode) on indri. -# -# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub -# adopts gilbert's existing data instead of being born empty): -# -# 1. On the seed device (gilbert): heph daemon stop -# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \ -# indri:~/.local/share/heph/heph.db -# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id + -# data; hephd regenerates a fresh origin on next start when it is missing): -# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';" -# 4. Run this role (installs hephd, stages the PWA, loads the launchagent). -# -# hephd auto-creates an empty store on first start if none exists, so seeding is -# optional — skip it only if you intend a fresh, empty hub. - -- name: Ensure heph data directory exists - ansible.builtin.file: - path: "{{ heph_data_dir }}" - state: directory - mode: '0700' - -- name: Check for installed hephd binary - ansible.builtin.stat: - path: "{{ heph_binary }}" - register: heph_binary_stat - -# Bootstrap install only when hephd is absent. Thereafter hephd's own -# --self-update keeps it current; ansible must not fight (or downgrade) it. -# This builds from source and can take several minutes on a cold cargo cache. -- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }}) - ansible.builtin.command: - cmd: >- - {{ heph_bin_dir }}/cargo install --locked - --git {{ heph_repo_url }} - --tag {{ heph_version }} - heph hephd - environment: - PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" - RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}" - when: not heph_binary_stat.stat.exists - changed_when: true - notify: Restart heph - -# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir), -# served directly by hephd. Static files are read from disk per request, so a -# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN") -# evicts stale assets on next load. -- name: Ensure heph cache parent directory exists - ansible.builtin.file: - path: "{{ heph_pwa_src_dir | dirname }}" - state: directory - mode: '0755' - -- name: Stage heph-pwa source at {{ heph_version }} - ansible.builtin.git: - repo: "{{ heph_repo_url }}" - dest: "{{ heph_pwa_src_dir }}" - version: "{{ heph_version }}" - depth: 1 - single_branch: true - force: true - -- name: Deploy heph LaunchAgent plist - ansible.builtin.template: - src: heph.plist.j2 - dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist - mode: '0644' - notify: Restart heph - -- name: Check if heph LaunchAgent is loaded - ansible.builtin.command: launchctl list mcquack.eblume.heph - register: heph_launchctl_check - changed_when: false - failed_when: false - -- name: Load heph LaunchAgent if not loaded - ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist - when: heph_launchctl_check.rc != 0 - changed_when: true - failed_when: false diff --git a/ansible/roles/heph/templates/heph.plist.j2 b/ansible/roles/heph/templates/heph.plist.j2 deleted file mode 100644 index 19a2367..0000000 --- a/ansible/roles/heph/templates/heph.plist.j2 +++ /dev/null @@ -1,50 +0,0 @@ - - - - - - Label - mcquack.eblume.heph - ProgramArguments - - {{ heph_binary }} - --mode - server - --http-addr - {{ heph_http_addr }} - --db - {{ heph_db }} - --socket - {{ heph_socket }} - --web-root - {{ heph_web_root }} - --oidc-issuer - {{ heph_oidc_issuer }} - --oidc-audience - {{ heph_oidc_audience }} - --self-update - --self-update-interval-secs - {{ heph_self_update_interval_secs }} - - RunAtLoad - - KeepAlive - - EnvironmentVariables - - - PATH - {{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin - HOME - /Users/erichblume - - RUSTUP_TOOLCHAIN - {{ heph_rust_toolchain }} - - StandardOutPath - {{ heph_log_dir }}/mcquack.heph.out.log - StandardErrorPath - {{ heph_log_dir }}/mcquack.heph.err.log - - diff --git a/argocd/apps/cloudnative-pg-ringtail.yaml b/argocd/apps/cloudnative-pg-ringtail.yaml deleted file mode 100644 index fa7bba0..0000000 --- a/argocd/apps/cloudnative-pg-ringtail.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# CloudNativePG Operator for ringtail k3s cluster -# Deploys the operator only; PostgreSQL clusters are created separately -# -# Sibling of cloudnative-pg.yaml (minikube). Same mirror, same release, -# different destination. Both apps will coexist during the immich -# migration; the minikube one is removed at the end of the broader -# indri-k8s decommission. -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: cloudnative-pg-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git - targetRevision: v1.27.1 - path: releases - directory: - include: 'cnpg-1.27.1.yaml' - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: cnpg-system - syncPolicy: - syncOptions: - - CreateNamespace=true - - ServerSideApply=true # Required for large CRDs that exceed annotation size limit diff --git a/argocd/apps/databases-ringtail.yaml b/argocd/apps/databases-ringtail.yaml deleted file mode 100644 index 00de4e3..0000000 --- a/argocd/apps/databases-ringtail.yaml +++ /dev/null @@ -1,26 +0,0 @@ -# Databases on ringtail k3s. -# -# Today: only immich-pg (CNPG Cluster) + its borgmatic ExternalSecret. -# More databases may move here as the indri-k8s decommission proceeds. -# -# Prerequisites: -# - cloudnative-pg-ringtail (operator must exist before the Cluster CR) -# - external-secrets-ringtail + 1password-connect-ringtail (for the -# immich-pg-borgmatic ExternalSecret to sync) -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: databases-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/databases-ringtail - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: databases - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/apps/external-secrets-ringtail.yaml b/argocd/apps/external-secrets-ringtail.yaml index 0bb8bd7..e2f5898 100644 --- a/argocd/apps/external-secrets-ringtail.yaml +++ b/argocd/apps/external-secrets-ringtail.yaml @@ -15,7 +15,7 @@ spec: source: repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git targetRevision: main - path: argocd/manifests/external-secrets-ringtail + path: argocd/manifests/external-secrets destination: server: https://ringtail.tail8d86e.ts.net:6443 namespace: external-secrets diff --git a/argocd/apps/immich-ringtail.yaml b/argocd/apps/immich-ringtail.yaml deleted file mode 100644 index c93cbee..0000000 --- a/argocd/apps/immich-ringtail.yaml +++ /dev/null @@ -1,31 +0,0 @@ -# Immich on ringtail k3s. -# -# Staging deployment; the minikube `immich` app remains in parallel -# until cutover. See [[immich-cutover-and-decommission]] for the -# routing flip + minikube cleanup. -# -# Prerequisites: -# - cnpg-on-ringtail + databases-ringtail (postgres) -# - 1password-connect-ringtail + external-secrets-ringtail (not used -# by this app today — immich-db Secret is created manually, -# matching the minikube pattern) -# - The immich-db Secret in the immich namespace, holding the -# password for the `immich` postgres role (copied from the source -# immich-pg-app Secret at migration time). -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: immich-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/immich-ringtail - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: immich - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/apps/immich.yaml b/argocd/apps/immich.yaml new file mode 100644 index 0000000..7efd263 --- /dev/null +++ b/argocd/apps/immich.yaml @@ -0,0 +1,30 @@ +# Immich - Self-hosted photo and video management +# High-performance Google Photos/iCloud alternative with AI features +# +# Kustomize manifests in argocd/manifests/immich/ +# Components: server, machine-learning, valkey (Redis) +# +# Prerequisites: +# 1. Create immich namespace and secrets: +# kubectl create namespace immich +# kubectl --context=minikube-indri create secret generic immich-db -n immich \ +# --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)" +# 2. Create immich-pg database and user (see immich-pg app) +# 3. NFS share on sifaka at /volume1/photos with read/write for indri +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: immich + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/immich + destination: + server: https://kubernetes.default.svc + namespace: immich + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/apps/mealie-ringtail.yaml b/argocd/apps/mealie-ringtail.yaml deleted file mode 100644 index 2f014a9..0000000 --- a/argocd/apps/mealie-ringtail.yaml +++ /dev/null @@ -1,26 +0,0 @@ -# Mealie on ringtail k3s. -# -# Wave-1 indri-k8s decommission. Staging deployment; the minikube `mealie` -# app stays in parallel until cutover (copy SQLite PVC, drop the minikube -# tailscale ingress, flip Caddy). See [[migrate-wave1-ringtail]]. -# -# Prerequisites: -# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore) -# - mealie-data PVC contents copied from minikube at cutover -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: mealie-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/mealie-ringtail - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: mealie - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/apps/mealie.yaml b/argocd/apps/mealie.yaml new file mode 100644 index 0000000..af33469 --- /dev/null +++ b/argocd/apps/mealie.yaml @@ -0,0 +1,17 @@ +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: mealie + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/mealie + destination: + server: https://kubernetes.default.svc + namespace: mealie + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/apps/paperless-ringtail.yaml b/argocd/apps/paperless-ringtail.yaml deleted file mode 100644 index bec98e9..0000000 --- a/argocd/apps/paperless-ringtail.yaml +++ /dev/null @@ -1,28 +0,0 @@ -# Paperless-ngx on ringtail k3s. -# -# Wave-1 indri-k8s decommission. Staging deployment; the minikube -# `paperless` app stays in parallel until cutover (drop the minikube -# tailscale ingress to free the name, then flip Caddy). See -# [[migrate-wave1-ringtail]]. -# -# Prerequisites: -# - databases-ringtail blumeops-pg (paperless database + role) -# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore) -# - sifaka NFS rule granting ringtail access to /volume1/paperless -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: paperless-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/paperless-ringtail - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: paperless - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/apps/paperless.yaml b/argocd/apps/paperless.yaml new file mode 100644 index 0000000..88437eb --- /dev/null +++ b/argocd/apps/paperless.yaml @@ -0,0 +1,17 @@ +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: paperless + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/paperless + destination: + server: https://kubernetes.default.svc + namespace: paperless + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/apps/teslamate-ringtail.yaml b/argocd/apps/teslamate-ringtail.yaml deleted file mode 100644 index b7b3491..0000000 --- a/argocd/apps/teslamate-ringtail.yaml +++ /dev/null @@ -1,28 +0,0 @@ -# TeslaMate on ringtail k3s. -# -# Wave-1 indri-k8s decommission. Staging deployment; the minikube -# `teslamate` app stays in parallel until cutover (migrate the teslamate -# database, drop the minikube tailscale ingress, flip Caddy). See -# [[migrate-wave1-ringtail]]. -# -# Prerequisites: -# - databases-ringtail blumeops-pg (teslamate database + role; cube + -# earthdistance extensions created by superuser at cutover) -# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore) -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: teslamate-ringtail - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/teslamate-ringtail - destination: - server: https://ringtail.tail8d86e.ts.net:6443 - namespace: teslamate - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/apps/teslamate.yaml b/argocd/apps/teslamate.yaml new file mode 100644 index 0000000..60247da --- /dev/null +++ b/argocd/apps/teslamate.yaml @@ -0,0 +1,32 @@ +# TeslaMate Tesla Data Logger +# Requires: CloudNativePG PostgreSQL cluster and manual secret setup +# +# Before syncing, create the namespace and secrets: +# kubectl create namespace teslamate +# op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl | kubectl apply -f - +# op inject -i argocd/manifests/teslamate/secret-encryption-key.yaml.tpl | kubectl apply -f - +# op inject -i argocd/manifests/teslamate/secret-db.yaml.tpl | kubectl apply -f - +# +# Then create the database: +# PGPASSWORD=$(op read "op://blumeops/postgres/password") \ +# psql -h pg.ops.eblu.me -U eblume -c "CREATE DATABASE teslamate OWNER teslamate;" +# +# After syncing, access the TeslaMate UI at https://tesla.tail8d86e.ts.net to complete +# Tesla API authentication via OAuth flow. +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: teslamate + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/teslamate + destination: + server: https://kubernetes.default.svc + namespace: teslamate + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/manifests/alloy-k8s/config.alloy b/argocd/manifests/alloy-k8s/config.alloy index 2940b0b..56a2e13 100644 --- a/argocd/manifests/alloy-k8s/config.alloy +++ b/argocd/manifests/alloy-k8s/config.alloy @@ -191,9 +191,14 @@ prometheus.exporter.blackbox "services" { } target { - // Migrated to ringtail (wave-1); probe through Caddy over Tailscale. name = "teslamate" - address = "https://tesla.ops.eblu.me/" + address = "http://teslamate.teslamate.svc.cluster.local:4000/" + module = "http_2xx" + } + + target { + name = "immich" + address = "http://immich-server.immich.svc.cluster.local:2283/api/server/ping" module = "http_2xx" } diff --git a/argocd/manifests/alloy-ringtail/config.alloy b/argocd/manifests/alloy-ringtail/config.alloy index e5cc045..e92ab0f 100644 --- a/argocd/manifests/alloy-ringtail/config.alloy +++ b/argocd/manifests/alloy-ringtail/config.alloy @@ -45,26 +45,6 @@ prometheus.scrape "kube_state_metrics" { forward_to = [prometheus.remote_write.prometheus.receiver] } -// ============== SERVICE HEALTH PROBES ============== - -// Blackbox-style HTTP probes for in-cluster services on ringtail -prometheus.exporter.blackbox "services" { - config = "{ modules: { http_2xx: { prober: http, timeout: 5s } } }" - - target { - name = "immich" - address = "http://immich-server.immich.svc.cluster.local:2283/api/server/ping" - module = "http_2xx" - } -} - -// Scrape blackbox probe results -prometheus.scrape "blackbox" { - targets = prometheus.exporter.blackbox.services.targets - scrape_interval = "30s" - forward_to = [prometheus.remote_write.prometheus.receiver] -} - // Push metrics to indri Prometheus prometheus.remote_write "prometheus" { external_labels = { cluster = "ringtail" } diff --git a/argocd/manifests/argocd/argocd-rbac-cm-patch.yaml b/argocd/manifests/argocd/argocd-rbac-cm-patch.yaml index 4914587..c2ea095 100644 --- a/argocd/manifests/argocd/argocd-rbac-cm-patch.yaml +++ b/argocd/manifests/argocd/argocd-rbac-cm-patch.yaml @@ -2,9 +2,6 @@ # # - workflow-bot: minimal CI/CD permissions (sync, get) # - admins: Authentik admins group mapped to ArgoCD admin role -# - admin: local break-glass account — keeps ArgoCD admin rights for when -# Authentik SSO is unavailable (without this it has no permissions, since -# policy.default is unset) # apiVersion: v1 kind: ConfigMap @@ -17,4 +14,3 @@ data: p, role:workflow-bot, applications, get, *, allow g, workflow-bot, role:workflow-bot g, admins, role:admin - g, admin, role:admin diff --git a/argocd/manifests/authentik/configmap-blueprint.yaml b/argocd/manifests/authentik/configmap-blueprint.yaml index cc97dea..fcbb99b 100644 --- a/argocd/manifests/authentik/configmap-blueprint.yaml +++ b/argocd/manifests/authentik/configmap-blueprint.yaml @@ -434,93 +434,3 @@ data: provider: !KeyOf mealie-provider meta_launch_url: https://meals.ops.eblu.me policy_engine_mode: all - - heph.yaml: | - version: 1 - metadata: - name: BlumeOps Heph SSO - labels: - blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow" - entries: - # Device-code flow (RFC 8628). authentik ships no default for this, so we - # create one and bind it to the brand below. An empty stage_configuration - # flow is sufficient: the already-authenticated user just confirms the code. - - model: authentik_flows.flow - id: device-code-flow - identifiers: - slug: default-device-code-flow - attrs: - name: Device code flow - title: Device code flow - slug: default-device-code-flow - designation: stage_configuration - authentication: require_authenticated - - # Enable the device-code grant globally by binding the flow to the default - # brand (domain authentik-default). Partial update — only sets this field. - - model: authentik_brands.brand - identifiers: - domain: authentik-default - attrs: - flow_device_code: !KeyOf device-code-flow - - # OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret). - # client_id doubles as the token audience the hub verifies (--oidc-audience heph), - # and the app slug 'heph' is the issuer path (/application/o/heph/). - - model: authentik_providers_oauth2.oauth2provider - id: heph-provider - identifiers: - name: Heph - attrs: - name: Heph - authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]] - invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]] - client_type: public - client_id: heph - # CLI/TUI use the device-code grant (no redirect). The heph-pwa browser - # login uses Authorization Code + PKCE, which DOES redirect back to the - # app's origin — register those here (Authentik also keys token-endpoint - # CORS off these origins). Trailing slash matters: the PWA's redirect_uri - # is its base dir, e.g. https://heph.ops.eblu.me/. - redirect_uris: - - matching_mode: strict - url: https://heph.ops.eblu.me/ - - matching_mode: strict - url: http://localhost:8787/ # local dev (hephd --web-root) - signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]] - property_mappings: - - !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]] - - !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]] - - !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]] - # offline_access: heph CLI requests "openid offline_access"; without - # this mapping the refresh token is session-bound and hephd's - # refresh_token grant 400s once the session lapses (spoke sync dies). - - !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]] - sub_mode: hashed_user_id - include_claims_in_id_token: true - - # Heph application — linked to the OAuth2 provider - - model: authentik_core.application - id: heph-app - identifiers: - slug: heph - attrs: - name: Hephaestus - slug: heph - provider: !KeyOf heph-provider - meta_launch_url: https://heph.ops.eblu.me - policy_engine_mode: any - - # Policy binding — restrict heph to admins group (single-owner, sensitive data) - - model: authentik_policies.policybinding - identifiers: - order: 0 - target: !KeyOf heph-app - group: !Find [authentik_core.group, [name, admins]] - attrs: - target: !KeyOf heph-app - group: !Find [authentik_core.group, [name, admins]] - order: 0 - enabled: true - negate: false - timeout: 30 diff --git a/argocd/manifests/databases-ringtail/blumeops-pg.yaml b/argocd/manifests/databases-ringtail/blumeops-pg.yaml deleted file mode 100644 index 3a37249..0000000 --- a/argocd/manifests/databases-ringtail/blumeops-pg.yaml +++ /dev/null @@ -1,97 +0,0 @@ -# PostgreSQL Cluster for blumeops services on ringtail k3s. -# -# Wave-1 indri-k8s decommission target (see [[migrate-wave1-ringtail]]). -# Holds the paperless and teslamate databases migrated off the minikube -# blumeops-pg via cold pg_dump/pg_restore at cutover. miniflux + authentik -# stay where they are for now (later waves), so this cluster only carries -# the wave-1 roles. -# -# Apps reach this in-cluster at blumeops-pg-rw.databases.svc.cluster.local -# — the same name they used on minikube, so teslamate's DATABASE_HOST is -# unchanged. -# -# Database creation is deferred to cutover, mirroring the minikube cluster -# (where only the bootstrap database is declared and the rest were created -# out-of-band): -# - paperless: the bootstrap database below (restored into at cutover). -# - teslamate: created at its cutover by the eblume superuser, because the -# dump's `earthdistance` extension is untrusted and CREATE EXTENSION -# needs superuser. (cube + earthdistance ownership then transferred to -# the teslamate role so it can ALTER EXTENSION UPDATE.) -apiVersion: postgresql.cnpg.io/v1 -kind: Cluster -metadata: - name: blumeops-pg - namespace: databases -spec: - instances: 1 - imageName: ghcr.io/cloudnative-pg/postgresql:18.3 - - storage: - size: 10Gi - storageClass: local-path - - bootstrap: - initdb: - database: paperless - owner: paperless - - managed: - roles: - # eblume superuser for admin + privileged restore steps (extensions) - - name: eblume - login: true - superuser: true - createdb: true - createrole: true - connectionLimit: -1 - ensure: present - inherit: true - passwordSecret: - name: blumeops-pg-eblume - # borgmatic read-only user for backups - - name: borgmatic - login: true - connectionLimit: -1 - ensure: present - inherit: true - inRoles: - - pg_read_all_data - passwordSecret: - name: blumeops-pg-borgmatic - # paperless user (also the bootstrap database owner above; the - # managed role sets its password from the 1Password-backed secret) - - name: paperless - login: true - connectionLimit: -1 - ensure: present - inherit: true - passwordSecret: - name: blumeops-pg-paperless - # teslamate user. Extension ownership (cube, earthdistance) is - # transferred to this role at cutover so it can ALTER EXTENSION UPDATE. - - name: teslamate - login: true - connectionLimit: -1 - ensure: present - inherit: true - passwordSecret: - name: blumeops-pg-teslamate - - resources: - requests: - memory: "256Mi" - cpu: "100m" - limits: - memory: "1Gi" - cpu: "500m" - - postgresql: - parameters: - max_connections: "50" - shared_buffers: "128MB" - password_encryption: "scram-sha-256" - pg_hba: - # Password auth from anywhere; network security is via Tailscale. - - host all all 0.0.0.0/0 scram-sha-256 - - host all all ::/0 scram-sha-256 diff --git a/argocd/manifests/databases-ringtail/external-secret-eblume.yaml b/argocd/manifests/databases-ringtail/external-secret-eblume.yaml deleted file mode 100644 index a324c7d..0000000 --- a/argocd/manifests/databases-ringtail/external-secret-eblume.yaml +++ /dev/null @@ -1,30 +0,0 @@ -# ExternalSecret for eblume superuser password -# -# Replaces the manual op inject workflow from secret-eblume.yaml.tpl -# -# 1Password item: "postgres" in blumeops vault -# Field: "password" -# -apiVersion: external-secrets.io/v1 -kind: ExternalSecret -metadata: - name: blumeops-pg-eblume - namespace: databases -spec: - refreshInterval: 1h - secretStoreRef: - kind: ClusterSecretStore - name: onepassword-blumeops - target: - name: blumeops-pg-eblume - creationPolicy: Owner - template: - type: kubernetes.io/basic-auth - data: - username: eblume - password: "{{ .password }}" - data: - - secretKey: password - remoteRef: - key: postgres - property: password diff --git a/argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml b/argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml deleted file mode 100644 index 3d1fc14..0000000 --- a/argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml +++ /dev/null @@ -1,32 +0,0 @@ -# ExternalSecret for borgmatic backup user password on immich-pg cluster -# (ringtail k3s). -# -# Mirror of argocd/manifests/databases/external-secret-immich-borgmatic.yaml. -# The onepassword-blumeops ClusterSecretStore exists on ringtail via the -# external-secrets-ringtail app. -# -# 1Password item: "borgmatic" in blumeops vault -# Field: "db-password" -apiVersion: external-secrets.io/v1 -kind: ExternalSecret -metadata: - name: immich-pg-borgmatic - namespace: databases -spec: - refreshInterval: 1h - secretStoreRef: - kind: ClusterSecretStore - name: onepassword-blumeops - target: - name: immich-pg-borgmatic - creationPolicy: Owner - template: - type: kubernetes.io/basic-auth - data: - username: borgmatic - password: "{{ .password }}" - data: - - secretKey: password - remoteRef: - key: borgmatic - property: db-password diff --git a/argocd/manifests/databases-ringtail/immich-pg.yaml b/argocd/manifests/databases-ringtail/immich-pg.yaml deleted file mode 100644 index 982bc43..0000000 --- a/argocd/manifests/databases-ringtail/immich-pg.yaml +++ /dev/null @@ -1,53 +0,0 @@ -# PostgreSQL Cluster for Immich on ringtail k3s. -# -# Initially bootstrapped via CNPG pg_basebackup from the minikube -# immich-pg cluster on 2026-05-13, then promoted to primary. The -# externalClusters + bootstrap.pg_basebackup blocks have been pruned -# from this manifest now that the migration is complete — leaving -# them around is a footgun (re-enabling replica.enabled=true would -# try to demote this cluster against a stale source). See -# [[immich-pg-data-migration]] for the procedure used. -apiVersion: postgresql.cnpg.io/v1 -kind: Cluster -metadata: - name: immich-pg - namespace: databases -spec: - instances: 1 - imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0 - - storage: - size: 10Gi - storageClass: local-path - - # Managed roles - managed: - roles: - - name: borgmatic - login: true - connectionLimit: -1 - ensure: present - inherit: true - inRoles: - - pg_read_all_data - passwordSecret: - name: immich-pg-borgmatic - - resources: - requests: - memory: "256Mi" - cpu: "100m" - limits: - memory: "1Gi" - cpu: "500m" - - postgresql: - shared_preload_libraries: - - "vchord.so" - parameters: - max_connections: "50" - shared_buffers: "128MB" - password_encryption: "scram-sha-256" - pg_hba: - - host all all 0.0.0.0/0 scram-sha-256 - - host all all ::/0 scram-sha-256 diff --git a/argocd/manifests/databases-ringtail/kustomization.yaml b/argocd/manifests/databases-ringtail/kustomization.yaml deleted file mode 100644 index 143345c..0000000 --- a/argocd/manifests/databases-ringtail/kustomization.yaml +++ /dev/null @@ -1,16 +0,0 @@ -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization - -namespace: databases - -resources: - - immich-pg.yaml - - external-secret-immich-borgmatic.yaml - - service-immich-pg-tailscale.yaml - # wave-1 indri-k8s decommission: blumeops-pg (paperless + teslamate) - - blumeops-pg.yaml - - service-blumeops-pg-tailscale.yaml - - external-secret-eblume.yaml - - external-secret-borgmatic.yaml - - external-secret-paperless.yaml - - external-secret-teslamate.yaml diff --git a/argocd/manifests/databases-ringtail/service-blumeops-pg-tailscale.yaml b/argocd/manifests/databases-ringtail/service-blumeops-pg-tailscale.yaml deleted file mode 100644 index f7ca5ef..0000000 --- a/argocd/manifests/databases-ringtail/service-blumeops-pg-tailscale.yaml +++ /dev/null @@ -1,24 +0,0 @@ -# Tailscale LoadBalancer for the ringtail blumeops-pg cluster. -# Canonical hostname: blumeops-pg-ringtail.tail8d86e.ts.net (distinct from -# the minikube blumeops-pg, which still owns pg.tail8d86e.ts.net until the -# wave-1 decommission). Borgmatic on indri and the Grafana TeslaMate -# datasource reach it via the Caddy L4 route pg.ops.eblu.me:5434. -apiVersion: v1 -kind: Service -metadata: - name: blumeops-pg-tailscale - namespace: databases - annotations: - tailscale.com/hostname: "blumeops-pg-ringtail" - tailscale.com/proxy-class: "default" -spec: - type: LoadBalancer - loadBalancerClass: tailscale - selector: - cnpg.io/cluster: blumeops-pg - role: primary - ports: - - name: postgresql - port: 5432 - targetPort: 5432 - protocol: TCP diff --git a/argocd/manifests/databases/blumeops-pg.yaml b/argocd/manifests/databases/blumeops-pg.yaml index 37aef23..58c771a 100644 --- a/argocd/manifests/databases/blumeops-pg.yaml +++ b/argocd/manifests/databases/blumeops-pg.yaml @@ -44,9 +44,18 @@ spec: - pg_read_all_data passwordSecret: name: blumeops-pg-borgmatic - # teslamate + paperless roles removed: migrated to ringtail blumeops-pg - # (wave-1 decommission). Their databases were dropped from this cluster - # after the cutover was verified and backed up. + # teslamate user for TeslaMate Tesla data logger + # Superuser removed. Extension ownership (cube, earthdistance) + # transferred manually so teslamate can ALTER EXTENSION UPDATE. + # earthdistance is untrusted — DROP+CREATE needs temporary + # superuser escalation during upgrades. + - name: teslamate + login: true + connectionLimit: -1 + ensure: present + inherit: true + passwordSecret: + name: blumeops-pg-teslamate # authentik user for Authentik identity provider (runs on ringtail) - name: authentik login: true @@ -56,6 +65,14 @@ spec: createdb: true passwordSecret: name: blumeops-pg-authentik + # paperless user for Paperless-ngx document management + - name: paperless + login: true + connectionLimit: -1 + ensure: present + inherit: true + passwordSecret: + name: blumeops-pg-paperless # Resource limits for minikube environment resources: diff --git a/argocd/manifests/databases-ringtail/external-secret-borgmatic.yaml b/argocd/manifests/databases/external-secret-immich-borgmatic.yaml similarity index 73% rename from argocd/manifests/databases-ringtail/external-secret-borgmatic.yaml rename to argocd/manifests/databases/external-secret-immich-borgmatic.yaml index ee600e3..8801c1a 100644 --- a/argocd/manifests/databases-ringtail/external-secret-borgmatic.yaml +++ b/argocd/manifests/databases/external-secret-immich-borgmatic.yaml @@ -1,14 +1,13 @@ -# ExternalSecret for borgmatic backup user password -# -# Replaces the manual op inject workflow from secret-borgmatic.yaml.tpl +# ExternalSecret for borgmatic backup user password on immich-pg cluster # +# Reuses the same 1Password item as blumeops-pg-borgmatic. # 1Password item: "borgmatic" in blumeops vault # Field: "db-password" # apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: - name: blumeops-pg-borgmatic + name: immich-pg-borgmatic namespace: databases spec: refreshInterval: 1h @@ -16,7 +15,7 @@ spec: kind: ClusterSecretStore name: onepassword-blumeops target: - name: blumeops-pg-borgmatic + name: immich-pg-borgmatic creationPolicy: Owner template: type: kubernetes.io/basic-auth diff --git a/argocd/manifests/databases-ringtail/external-secret-paperless.yaml b/argocd/manifests/databases/external-secret-paperless.yaml similarity index 100% rename from argocd/manifests/databases-ringtail/external-secret-paperless.yaml rename to argocd/manifests/databases/external-secret-paperless.yaml diff --git a/argocd/manifests/databases-ringtail/external-secret-teslamate.yaml b/argocd/manifests/databases/external-secret-teslamate.yaml similarity index 100% rename from argocd/manifests/databases-ringtail/external-secret-teslamate.yaml rename to argocd/manifests/databases/external-secret-teslamate.yaml diff --git a/argocd/manifests/databases/immich-pg.yaml b/argocd/manifests/databases/immich-pg.yaml new file mode 100644 index 0000000..74c6f4e --- /dev/null +++ b/argocd/manifests/databases/immich-pg.yaml @@ -0,0 +1,69 @@ +# PostgreSQL Cluster for Immich +# Uses VectorChord (successor to pgvecto.rs) for AI-powered vector search +# See: https://github.com/immich-app/immich/discussions/9060 +# Managed by CloudNativePG operator +apiVersion: postgresql.cnpg.io/v1 +kind: Cluster +metadata: + name: immich-pg + namespace: databases +spec: + instances: 1 + # VectorChord image for PostgreSQL 17 with VectorChord 0.5.0 + # Immich v2.4.1 requires VectorChord >=0.3 <0.6 + # See: https://github.com/tensorchord/VectorChord + imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0 + + storage: + size: 10Gi + storageClass: standard + + # Bootstrap creates initial database and owner + bootstrap: + initdb: + database: immich + owner: immich + postInitSQL: + # Extensions required by Immich + - CREATE EXTENSION IF NOT EXISTS vector; + - CREATE EXTENSION IF NOT EXISTS vchord CASCADE; + - CREATE EXTENSION IF NOT EXISTS cube CASCADE; + - CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE; + + # Managed roles + # Note: connectionLimit, ensure, inherit are CNPG defaults added to prevent ArgoCD drift + managed: + roles: + # borgmatic read-only user for backups + - name: borgmatic + login: true + connectionLimit: -1 + ensure: present + inherit: true + inRoles: + - pg_read_all_data + passwordSecret: + name: immich-pg-borgmatic + + # Resource limits for minikube environment + resources: + requests: + memory: "256Mi" + cpu: "100m" + limits: + memory: "1Gi" + cpu: "500m" + + # PostgreSQL configuration + postgresql: + # VectorChord requires vchord.so in shared_preload_libraries + shared_preload_libraries: + - "vchord.so" + parameters: + max_connections: "50" + shared_buffers: "128MB" + password_encryption: "scram-sha-256" + pg_hba: + # Allow connections from k8s pods + - host all all 0.0.0.0/0 scram-sha-256 + - host all all ::/0 scram-sha-256 diff --git a/argocd/manifests/databases/kustomization.yaml b/argocd/manifests/databases/kustomization.yaml index 0393757..b25e09e 100644 --- a/argocd/manifests/databases/kustomization.yaml +++ b/argocd/manifests/databases/kustomization.yaml @@ -5,8 +5,13 @@ namespace: databases resources: - blumeops-pg.yaml + - immich-pg.yaml - service-tailscale.yaml + - service-immich-pg-tailscale.yaml - service-metrics-tailscale.yaml - external-secret-eblume.yaml - external-secret-borgmatic.yaml + - external-secret-immich-borgmatic.yaml + - external-secret-teslamate.yaml - external-secret-authentik.yaml + - external-secret-paperless.yaml diff --git a/argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml b/argocd/manifests/databases/service-immich-pg-tailscale.yaml similarity index 57% rename from argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml rename to argocd/manifests/databases/service-immich-pg-tailscale.yaml index 92deb14..78891dd 100644 --- a/argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml +++ b/argocd/manifests/databases/service-immich-pg-tailscale.yaml @@ -1,8 +1,6 @@ -# Tailscale LoadBalancer for immich-pg PostgreSQL access on ringtail. -# Canonical hostname: immich-pg.tail8d86e.ts.net (claimed from the -# minikube side after the minikube service was removed during the -# immich-to-ringtail migration). Borgmatic on indri uses this -# hostname for nightly backups. +# Tailscale LoadBalancer for immich-pg PostgreSQL access +# Canonical hostname: immich-pg.tail8d86e.ts.net +# Caddy L4 proxies pg.ops.eblu.me:5433 → this service for borgmatic backups apiVersion: v1 kind: Service metadata: diff --git a/argocd/manifests/external-secrets-ringtail/kustomization.yaml b/argocd/manifests/external-secrets-ringtail/kustomization.yaml deleted file mode 100644 index 9fd4e2f..0000000 --- a/argocd/manifests/external-secrets-ringtail/kustomization.yaml +++ /dev/null @@ -1,16 +0,0 @@ -# Ringtail (amd64) overlay for external-secrets. -# -# Reuses the shared indri manifest as a base and only overrides the controller -# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64 -# image (built via containers/external-secrets/container.py on indri's Dagger -# runner); ringtail's k3s is amd64 and needs the image built by -# containers/external-secrets/default.nix on the nix-container-builder. -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization - -resources: - - ../external-secrets - -images: - - name: registry.ops.eblu.me/blumeops/external-secrets - newTag: v2.2.0-13895bb-nix diff --git a/argocd/manifests/external-secrets/kustomization.yaml b/argocd/manifests/external-secrets/kustomization.yaml index 639db66..574aaa7 100644 --- a/argocd/manifests/external-secrets/kustomization.yaml +++ b/argocd/manifests/external-secrets/kustomization.yaml @@ -12,5 +12,4 @@ resources: images: - name: ghcr.io/external-secrets/external-secrets - newName: registry.ops.eblu.me/blumeops/external-secrets - newTag: v2.2.0-13895bb + newTag: v2.2.0 diff --git a/argocd/manifests/grafana/datasources.yaml b/argocd/manifests/grafana/datasources.yaml index 64ed2bf..5a3d0f3 100644 --- a/argocd/manifests/grafana/datasources.yaml +++ b/argocd/manifests/grafana/datasources.yaml @@ -63,7 +63,5 @@ datasources: password: $TESLAMATE_DB_PASSWORD type: postgres uid: TeslaMate - # teslamate DB migrated to ringtail blumeops-pg (wave-1); reached via the - # Caddy L4 route on indri (pg.ops.eblu.me:5434 -> blumeops-pg-ringtail). - url: pg.ops.eblu.me:5434 + url: blumeops-pg-rw.databases.svc.cluster.local:5432 user: teslamate diff --git a/argocd/manifests/grafana/deployment.yaml b/argocd/manifests/grafana/deployment.yaml index cbba267..0aad9b3 100644 --- a/argocd/manifests/grafana/deployment.yaml +++ b/argocd/manifests/grafana/deployment.yaml @@ -14,9 +14,7 @@ spec: app.kubernetes.io/name: grafana app.kubernetes.io/instance: grafana strategy: - # RWO PVC for SQLite + Bleve index — RollingUpdate spawns the new pod - # before the old one terminates, and it crashloops on the index lock. - type: Recreate + type: RollingUpdate template: metadata: labels: diff --git a/argocd/manifests/homepage/services.yaml b/argocd/manifests/homepage/services.yaml index cc1adf4..d552ff2 100644 --- a/argocd/manifests/homepage/services.yaml +++ b/argocd/manifests/homepage/services.yaml @@ -71,6 +71,10 @@ enableBlocks: true enableNowPlaying: false fields: ["movies", "series", "episodes"] + - Mealie: + href: https://meals.ops.eblu.me + icon: mealie.png + description: Recipe manager - DJ: href: https://dj.ops.eblu.me icon: navidrome.png @@ -81,7 +85,15 @@ user: "{{HOMEPAGE_VAR_NAVIDROME_USER}}" token: "{{HOMEPAGE_VAR_NAVIDROME_TOKEN}}" salt: "{{HOMEPAGE_VAR_NAVIDROME_SALT}}" + - Paperless: + href: https://paperless.ops.eblu.me + icon: paperless-ngx.png + description: Document management - Content: + - Immich: + href: https://photos.ops.eblu.me + icon: immich.png + description: Photo management - Kiwix: href: https://kiwix.ops.eblu.me icon: kiwix.png @@ -126,6 +138,10 @@ href: https://docs.eblu.me icon: mdi-book-open-page-variant description: BlumeOps Documentation + - TeslaMate: + href: https://tesla.ops.eblu.me + icon: teslamate.png + description: Tesla data logger - Transmission: href: https://torrent.ops.eblu.me icon: transmission.png diff --git a/argocd/manifests/immich-ringtail/pv-nfs.yaml b/argocd/manifests/immich-ringtail/pv-nfs.yaml deleted file mode 100644 index 3d5a682..0000000 --- a/argocd/manifests/immich-ringtail/pv-nfs.yaml +++ /dev/null @@ -1,29 +0,0 @@ -# NFS PersistentVolume for Immich photo library on ringtail k3s. -# -# Mirror of argocd/manifests/immich/pv-nfs.yaml (minikube) but with -# a distinct name (minikube and ringtail are separate clusters, so PV -# names don't collide cluster-side, but using the same name in two -# manifests is confusing). -# -# The sifaka NFS export for /volume1/photos already permits -# 192.168.1.0/24 + 100.64.0.0/10. Ringtail's wired IP (192.168.1.21) -# falls in the first CIDR, so no DSM rule changes are needed. -# -# Verified 2026-05-13: ringtail pod can read existing dirs, write -# new files, and delete them. DNS resolves sifaka to 192.168.1.203 -# (LAN), so NFS traffic stays off the tailnet — avoids the known -# sifaka-tailscale-userspace bite. -apiVersion: v1 -kind: PersistentVolume -metadata: - name: immich-library-nfs-pv-ringtail -spec: - capacity: - storage: 2Ti - accessModes: - - ReadWriteMany - persistentVolumeReclaimPolicy: Retain - storageClassName: "" - nfs: - server: sifaka - path: /volume1/photos diff --git a/argocd/manifests/immich/README.md b/argocd/manifests/immich/README.md new file mode 100644 index 0000000..a82a856 --- /dev/null +++ b/argocd/manifests/immich/README.md @@ -0,0 +1,115 @@ +# Immich + +Self-hosted photo and video management solution with AI-powered search and face recognition. + +## Prerequisites + +1. **NFS Share**: Create `/volume1/photos` on sifaka with NFS permissions for indri +2. **PostgreSQL**: The `immich-pg` cluster (with pgvecto.rs) must be healthy +3. **Secrets**: Create the database password secret + +## Deployment Order + +1. Sync `blumeops-pg` (to get CloudNativePG operator if not already running) +2. Wait for `immich-pg` cluster to be healthy +3. Create secrets (see below) +4. Sync `immich` (deploys all resources: storage, services, deployments) +5. Run `mise run provision-indri -- --tags caddy` to update Caddy config + +## Components + +| Component | Deployment | Service | Port | +|-----------|------------|---------|------| +| Server (web/API) | `immich-server` | `immich-server` | 2283 | +| Machine Learning | `immich-machine-learning` | `immich-machine-learning` | 3003 | +| Valkey (Redis) | `immich-valkey` | `immich-valkey` | 6379 | + +## Secret Setup + +The `immich-db` secret contains the database password, which is auto-generated by CloudNativePG +in the `immich-pg-app` secret. To create or regenerate the secret: + +```bash +# Create namespace if needed +kubectl --context=minikube-indri create namespace immich + +# Copy password from CNPG secret to immich namespace +kubectl --context=minikube-indri create secret generic immich-db -n immich \ + --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)" +``` + +Note: This secret is not managed by ExternalSecrets since the source of truth is the CNPG-generated secret. + +## Access + +- **URL**: https://photos.ops.eblu.me (after Caddy is updated) +- **Tailscale**: https://photos.tail8d86e.ts.net (direct) + +## First-Time Setup + +1. Navigate to https://photos.ops.eblu.me +2. Create an admin account +3. Configure external library (optional - for importing existing photos) + +## External Library (iCloud Photos) + +To import existing photos from iCloud sync on indri: + +1. In Immich Admin > External Libraries, create a new library +2. Set the import path to the location where iCloud photos sync +3. Configure scan schedule or trigger manual scan + +## Architecture + +``` +┌─────────────────┐ ┌─────────────────┐ +│ immich-server │────▶│ immich-pg │ +│ (web/api) │ │ (PostgreSQL │ +└────────┬────────┘ │ + pgvecto.rs) │ + │ └─────────────────┘ + │ +┌────────▼────────┐ ┌─────────────────┐ +│ immich-ml │ │ valkey │ +│ (ML inference) │ │ (Redis cache) │ +└─────────────────┘ └─────────────────┘ + │ +┌────────▼────────┐ +│ sifaka NFS │ +│ /volume1/photos│ +└─────────────────┘ +``` + +## Version Management + +Image versions are controlled via `kustomization.yaml`: + +```yaml +images: + - name: ghcr.io/immich-app/immich-server + newTag: v2.6.3 + - name: ghcr.io/immich-app/immich-machine-learning + newTag: v2.6.3 + - name: docker.io/valkey/valkey + newTag: "8.1-alpine" +``` + +To upgrade, update `newTag` values and sync via ArgoCD. + +## Troubleshooting + +```bash +# Check pods +kubectl --context=minikube-indri -n immich get pods + +# Check immich-pg cluster +kubectl --context=minikube-indri -n databases get cluster immich-pg + +# View server logs +kubectl --context=minikube-indri -n immich logs -l app=immich,component=server + +# View ML logs +kubectl --context=minikube-indri -n immich logs -l app=immich,component=machine-learning + +# Check PVC binding +kubectl --context=minikube-indri -n immich get pvc +``` diff --git a/argocd/manifests/immich-ringtail/deployment-ml.yaml b/argocd/manifests/immich/deployment-ml.yaml similarity index 83% rename from argocd/manifests/immich-ringtail/deployment-ml.yaml rename to argocd/manifests/immich/deployment-ml.yaml index 5ea8035..57c4242 100644 --- a/argocd/manifests/immich-ringtail/deployment-ml.yaml +++ b/argocd/manifests/immich/deployment-ml.yaml @@ -16,16 +16,11 @@ spec: app: immich component: machine-learning spec: - runtimeClassName: nvidia securityContext: seccompProfile: type: RuntimeDefault containers: - name: machine-learning - # ringtail uses the -cuda tag (set in kustomization.yaml) - # to take advantage of the RTX 4080 via the nvidia - # device plugin. Time-slicing is configured for 4 replicas - # so frigate + ollama + this pod can share. image: ghcr.io/immich-app/immich-machine-learning:kustomized ports: - name: http @@ -62,7 +57,6 @@ spec: cpu: "100m" limits: memory: "4Gi" - nvidia.com/gpu: "1" volumes: - name: cache persistentVolumeClaim: diff --git a/argocd/manifests/immich-ringtail/deployment-server.yaml b/argocd/manifests/immich/deployment-server.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/deployment-server.yaml rename to argocd/manifests/immich/deployment-server.yaml diff --git a/argocd/manifests/immich-ringtail/deployment-valkey.yaml b/argocd/manifests/immich/deployment-valkey.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/deployment-valkey.yaml rename to argocd/manifests/immich/deployment-valkey.yaml diff --git a/argocd/manifests/immich-ringtail/ingress-tailscale.yaml b/argocd/manifests/immich/ingress-tailscale.yaml similarity index 62% rename from argocd/manifests/immich-ringtail/ingress-tailscale.yaml rename to argocd/manifests/immich/ingress-tailscale.yaml index f0b5fe1..59a4c05 100644 --- a/argocd/manifests/immich-ringtail/ingress-tailscale.yaml +++ b/argocd/manifests/immich/ingress-tailscale.yaml @@ -1,9 +1,6 @@ -# Tailscale ProxyGroup Ingress for Immich on ringtail. -# -# Production hostname: photos.tail8d86e.ts.net -# (during the cutover window this was photos-ringtail; the minikube -# ingress was torn down before this was renamed to photos to avoid -# the Tailscale device-name collision.) +# Tailscale Ingress for Immich +# Exposes Immich at photos.tail8d86e.ts.net +# Caddy will proxy photos.ops.eblu.me to this endpoint apiVersion: networking.k8s.io/v1 kind: Ingress metadata: @@ -19,6 +16,12 @@ metadata: gethomepage.dev/description: "Photo management" gethomepage.dev/href: "https://photos.ops.eblu.me" gethomepage.dev/pod-selector: "app=immich,component=server" + # TODO: Add Immich widget - requires API key from Account Settings > API Keys + # See: https://gethomepage.dev/widgets/services/immich/ + # gethomepage.dev/widget.type: "immich" + # gethomepage.dev/widget.url: "https://photos.ops.eblu.me" + # gethomepage.dev/widget.key: "{{HOMEPAGE_VAR_IMMICH_API_KEY}}" + # gethomepage.dev/widget.version: "2" spec: ingressClassName: tailscale rules: diff --git a/argocd/manifests/immich-ringtail/kustomization.yaml b/argocd/manifests/immich/kustomization.yaml similarity index 62% rename from argocd/manifests/immich-ringtail/kustomization.yaml rename to argocd/manifests/immich/kustomization.yaml index 2fa131c..5f8d02b 100644 --- a/argocd/manifests/immich-ringtail/kustomization.yaml +++ b/argocd/manifests/immich/kustomization.yaml @@ -1,8 +1,7 @@ +--- apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization - namespace: immich - resources: - deployment-server.yaml - deployment-ml.yaml @@ -14,16 +13,11 @@ resources: - pv-nfs.yaml - pvc.yaml - ingress-tailscale.yaml - images: - name: ghcr.io/immich-app/immich-server newTag: v2.6.3 - name: ghcr.io/immich-app/immich-machine-learning - # CUDA variant of the same release — ringtail has an RTX 4080 - newTag: v2.6.3-cuda - # amd64 valkey built via nix on the ringtail nix-container-builder - # (see containers/valkey/default.nix). The Alpine container.py build - # is arm64-only and serves paperless on indri. + newTag: v2.6.3 - name: docker.io/valkey/valkey newName: registry.ops.eblu.me/blumeops/valkey - newTag: v8.1.7-ecded30-nix + newTag: v8.1.6-r0-fabca04 diff --git a/argocd/manifests/immich/pv-nfs.yaml b/argocd/manifests/immich/pv-nfs.yaml new file mode 100644 index 0000000..0bd6ee2 --- /dev/null +++ b/argocd/manifests/immich/pv-nfs.yaml @@ -0,0 +1,22 @@ +# NFS PersistentVolume for Immich photo library +# Requires: NFS share on sifaka at /volume1/photos with NFS permissions for indri +# +# To create on Synology: +# 1. Control Panel > Shared Folder > Create +# 2. Name: photos, Location: Volume 1 +# 3. Control Panel > File Services > NFS > NFS Rules +# 4. Add rule for "photos" share: Hostname=indri, Privilege=Read/Write, Squash=No mapping +apiVersion: v1 +kind: PersistentVolume +metadata: + name: immich-library-nfs-pv +spec: + capacity: + storage: 2Ti + accessModes: + - ReadWriteMany + persistentVolumeReclaimPolicy: Retain + storageClassName: "" + nfs: + server: sifaka + path: /volume1/photos diff --git a/argocd/manifests/immich-ringtail/pvc-ml-cache.yaml b/argocd/manifests/immich/pvc-ml-cache.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/pvc-ml-cache.yaml rename to argocd/manifests/immich/pvc-ml-cache.yaml diff --git a/argocd/manifests/immich-ringtail/pvc.yaml b/argocd/manifests/immich/pvc.yaml similarity index 54% rename from argocd/manifests/immich-ringtail/pvc.yaml rename to argocd/manifests/immich/pvc.yaml index 5bfc052..c764636 100644 --- a/argocd/manifests/immich-ringtail/pvc.yaml +++ b/argocd/manifests/immich/pvc.yaml @@ -1,5 +1,5 @@ -# PersistentVolumeClaim for Immich photo library on ringtail. -# Binds to immich-library-nfs-pv-ringtail (sifaka:/volume1/photos). +# PersistentVolumeClaim for Immich photo library +# Binds to the NFS PV for sifaka:/volume1/photos apiVersion: v1 kind: PersistentVolumeClaim metadata: @@ -9,7 +9,7 @@ spec: accessModes: - ReadWriteMany storageClassName: "" - volumeName: immich-library-nfs-pv-ringtail + volumeName: immich-library-nfs-pv resources: requests: storage: 2Ti diff --git a/argocd/manifests/immich-ringtail/service-ml.yaml b/argocd/manifests/immich/service-ml.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/service-ml.yaml rename to argocd/manifests/immich/service-ml.yaml diff --git a/argocd/manifests/immich-ringtail/service-valkey.yaml b/argocd/manifests/immich/service-valkey.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/service-valkey.yaml rename to argocd/manifests/immich/service-valkey.yaml diff --git a/argocd/manifests/immich-ringtail/service.yaml b/argocd/manifests/immich/service.yaml similarity index 100% rename from argocd/manifests/immich-ringtail/service.yaml rename to argocd/manifests/immich/service.yaml diff --git a/argocd/manifests/mealie-ringtail/deployment.yaml b/argocd/manifests/mealie/deployment.yaml similarity index 89% rename from argocd/manifests/mealie-ringtail/deployment.yaml rename to argocd/manifests/mealie/deployment.yaml index 10d06ab..bdcf91e 100644 --- a/argocd/manifests/mealie-ringtail/deployment.yaml +++ b/argocd/manifests/mealie/deployment.yaml @@ -1,9 +1,3 @@ -# Mealie on ringtail k3s — Nix image. -# -# Single gunicorn process (the Nix image's default `mealie-run` entrypoint -# runs init_db then gunicorn), serving the prebuilt frontend. DB is SQLite -# on the mealie-data PVC; its contents are copied from the minikube PVC at -# cutover. See [[migrate-wave1-ringtail]]. apiVersion: apps/v1 kind: Deployment metadata: @@ -11,8 +5,6 @@ metadata: namespace: mealie spec: replicas: 1 - strategy: - type: Recreate selector: matchLabels: app: mealie diff --git a/argocd/manifests/mealie-ringtail/external-secret.yaml b/argocd/manifests/mealie/external-secret.yaml similarity index 100% rename from argocd/manifests/mealie-ringtail/external-secret.yaml rename to argocd/manifests/mealie/external-secret.yaml diff --git a/argocd/manifests/mealie-ringtail/ingress-tailscale.yaml b/argocd/manifests/mealie/ingress-tailscale.yaml similarity index 100% rename from argocd/manifests/mealie-ringtail/ingress-tailscale.yaml rename to argocd/manifests/mealie/ingress-tailscale.yaml diff --git a/argocd/manifests/mealie-ringtail/kustomization.yaml b/argocd/manifests/mealie/kustomization.yaml similarity index 88% rename from argocd/manifests/mealie-ringtail/kustomization.yaml rename to argocd/manifests/mealie/kustomization.yaml index ad65785..fb0713b 100644 --- a/argocd/manifests/mealie-ringtail/kustomization.yaml +++ b/argocd/manifests/mealie/kustomization.yaml @@ -12,4 +12,4 @@ resources: images: - name: registry.ops.eblu.me/blumeops/mealie - newTag: v3.16.0-e0057b4-nix + newTag: v3.12.0-613f05d diff --git a/argocd/manifests/mealie-ringtail/pvc.yaml b/argocd/manifests/mealie/pvc.yaml similarity index 50% rename from argocd/manifests/mealie-ringtail/pvc.yaml rename to argocd/manifests/mealie/pvc.yaml index 89c38ef..f473e07 100644 --- a/argocd/manifests/mealie-ringtail/pvc.yaml +++ b/argocd/manifests/mealie/pvc.yaml @@ -1,5 +1,4 @@ -# SQLite data volume for Mealie on ringtail. Contents copied from the -# minikube mealie-data PVC at cutover (recipes, meal plans, uploaded media). +--- apiVersion: v1 kind: PersistentVolumeClaim metadata: @@ -8,7 +7,7 @@ metadata: spec: accessModes: - ReadWriteOnce - storageClassName: local-path + storageClassName: standard resources: requests: storage: 2Gi diff --git a/argocd/manifests/mealie-ringtail/service.yaml b/argocd/manifests/mealie/service.yaml similarity index 100% rename from argocd/manifests/mealie-ringtail/service.yaml rename to argocd/manifests/mealie/service.yaml diff --git a/argocd/manifests/nvidia-device-plugin/kustomization.yaml b/argocd/manifests/nvidia-device-plugin/kustomization.yaml index f5a33ae..a46edf6 100644 --- a/argocd/manifests/nvidia-device-plugin/kustomization.yaml +++ b/argocd/manifests/nvidia-device-plugin/kustomization.yaml @@ -10,4 +10,4 @@ resources: images: - name: nvcr.io/nvidia/k8s-device-plugin - newTag: v0.19.2 + newTag: v0.19.0 diff --git a/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml b/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml index 100e7a9..dee2fd7 100644 --- a/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml +++ b/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml @@ -11,4 +11,4 @@ data: timeSlicing: resources: - name: nvidia.com/gpu - replicas: 4 + replicas: 2 diff --git a/argocd/manifests/paperless-ringtail/pv-nfs.yaml b/argocd/manifests/paperless-ringtail/pv-nfs.yaml deleted file mode 100644 index 2990d1a..0000000 --- a/argocd/manifests/paperless-ringtail/pv-nfs.yaml +++ /dev/null @@ -1,22 +0,0 @@ -# NFS PersistentVolume for the Paperless document library, mounted from -# ringtail. Same sifaka export (/volume1/paperless) as the minikube PV, -# but a distinct PV name so both clusters can declare it during the -# parallel-run before cutover. -# -# Prerequisite: sifaka must have an NFS rule granting ringtail Read/Write -# (Squash=No mapping) on the paperless share — the same step done for -# immich. See [[sifaka-nfs-from-ringtail]]. -apiVersion: v1 -kind: PersistentVolume -metadata: - name: paperless-media-nfs-pv-ringtail -spec: - capacity: - storage: 500Gi - accessModes: - - ReadWriteMany - persistentVolumeReclaimPolicy: Retain - storageClassName: "" - nfs: - server: sifaka - path: /volume1/paperless diff --git a/argocd/manifests/paperless-ringtail/deployment.yaml b/argocd/manifests/paperless/deployment.yaml similarity index 53% rename from argocd/manifests/paperless-ringtail/deployment.yaml rename to argocd/manifests/paperless/deployment.yaml index de4f456..cc2c013 100644 --- a/argocd/manifests/paperless-ringtail/deployment.yaml +++ b/argocd/manifests/paperless/deployment.yaml @@ -1,17 +1,3 @@ -# Paperless-ngx on ringtail k3s — Nix image, multi-process. -# -# The upstream s6 image ran web + worker + scheduler + consumer (and DB -# migrations) in one container. The Nix image (containers/paperless/ -# default.nix) ships the binaries but no supervisor, so we run those as -# four containers in one pod, sharing the local data/consume dirs -# (emptyDir) and the NFS media volume; redis is colocated so -# PAPERLESS_REDIS=localhost works for all. A migrate initContainer runs -# DB migrations once before the app containers start. -# -# DB points in-cluster at the ringtail blumeops-pg (was pg.ops.eblu.me on -# indri). PAPERLESS_{DATA_DIR,MEDIA_ROOT,CONSUMPTION_DIR} are set -# explicitly because the Nix package does not default to the upstream -# /usr/src/paperless paths. apiVersion: apps/v1 kind: Deployment metadata: @@ -19,8 +5,6 @@ metadata: namespace: paperless spec: replicas: 1 - strategy: - type: Recreate selector: matchLabels: app: paperless @@ -32,38 +16,27 @@ spec: securityContext: seccompProfile: type: RuntimeDefault - initContainers: - # redis as a native sidecar (restartPolicy: Always): starts before - # the migrate init and stays running for the app containers, so all - # of them reach PAPERLESS_REDIS=localhost:6379. - - name: redis - image: docker.io/library/redis:kustomized - restartPolicy: Always - ports: - - containerPort: 6379 - volumeMounts: - - name: redis-data - mountPath: /data - resources: - requests: - memory: "32Mi" - cpu: "10m" - limits: - memory: "128Mi" - - name: migrate + containers: + - name: paperless image: registry.ops.eblu.me/blumeops/paperless:kustomized - command: ["paperless-ngx", "migrate", "--no-input"] - env: &paperless-env + ports: + - containerPort: 8000 + name: http + env: - name: PAPERLESS_URL value: "https://paperless.ops.eblu.me" - name: PAPERLESS_REDIS value: "redis://localhost:6379" - name: PAPERLESS_DBHOST - value: "blumeops-pg-rw.databases.svc.cluster.local" + value: "pg.ops.eblu.me" - name: PAPERLESS_DBPORT value: "5432" - name: PAPERLESS_DBNAME value: "paperless" + # Explicit port to override k8s-injected PAPERLESS_PORT env var + # (k8s sets PAPERLESS_PORT=tcp://... for a service named 'paperless') + - name: PAPERLESS_PORT + value: "8000" - name: PAPERLESS_DBUSER value: "paperless" - name: PAPERLESS_DBPASS @@ -71,16 +44,6 @@ spec: secretKeyRef: name: paperless-secrets key: db-password - # Explicit port to override the k8s-injected PAPERLESS_PORT - # (service named 'paperless' would set PAPERLESS_PORT=tcp://...) - - name: PAPERLESS_PORT - value: "8000" - - name: PAPERLESS_DATA_DIR - value: "/usr/src/paperless/data" - - name: PAPERLESS_MEDIA_ROOT - value: "/usr/src/paperless/media" - - name: PAPERLESS_CONSUMPTION_DIR - value: "/usr/src/paperless/consume" - name: PAPERLESS_SECRET_KEY valueFrom: secretKeyRef: @@ -92,6 +55,7 @@ spec: value: "eng" - name: PAPERLESS_TASK_WORKERS value: "1" + # Admin account (created on first startup) - name: PAPERLESS_ADMIN_USER value: "eblume" - name: PAPERLESS_ADMIN_PASSWORD @@ -101,6 +65,8 @@ spec: key: admin-password - name: PAPERLESS_ADMIN_MAIL value: "blume.erich@gmail.com" + # OIDC via Authentik + # Full JSON blob pulled from 1Password (includes client secret) - name: PAPERLESS_APPS value: "allauth.socialaccount.providers.openid_connect" - name: PAPERLESS_SOCIALACCOUNT_PROVIDERS @@ -116,27 +82,19 @@ spec: value: "false" - name: PAPERLESS_REDIRECT_LOGIN_TO_SSO value: "false" - volumeMounts: &paperless-mounts + volumeMounts: - name: data mountPath: /usr/src/paperless/data - name: media mountPath: /usr/src/paperless/media - name: consume mountPath: /usr/src/paperless/consume - containers: - - name: web - image: registry.ops.eblu.me/blumeops/paperless:kustomized - ports: - - containerPort: 8000 - name: http - env: *paperless-env - volumeMounts: *paperless-mounts resources: requests: memory: "256Mi" cpu: "100m" limits: - memory: "1Gi" + memory: "2Gi" cpu: "1000m" livenessProbe: httpGet: @@ -151,42 +109,16 @@ spec: initialDelaySeconds: 30 periodSeconds: 10 - - name: worker - image: registry.ops.eblu.me/blumeops/paperless:kustomized - command: ["celery", "--app", "paperless", "worker", "--loglevel", "INFO"] - env: *paperless-env - volumeMounts: *paperless-mounts + - name: redis + image: docker.io/library/redis:kustomized + ports: + - containerPort: 6379 resources: requests: - memory: "256Mi" - cpu: "100m" + memory: "32Mi" + cpu: "10m" limits: - memory: "1Gi" - cpu: "1000m" - - - name: beat - image: registry.ops.eblu.me/blumeops/paperless:kustomized - command: ["celery", "--app", "paperless", "beat", "--loglevel", "INFO"] - env: *paperless-env - volumeMounts: *paperless-mounts - resources: - requests: - memory: "64Mi" - cpu: "20m" - limits: - memory: "256Mi" - - - name: consumer - image: registry.ops.eblu.me/blumeops/paperless:kustomized - command: ["paperless-ngx", "document_consumer"] - env: *paperless-env - volumeMounts: *paperless-mounts - resources: - requests: memory: "128Mi" - cpu: "50m" - limits: - memory: "512Mi" volumes: - name: data @@ -196,6 +128,3 @@ spec: claimName: paperless-media - name: consume emptyDir: {} - - name: redis-data - emptyDir: - sizeLimit: 1Gi diff --git a/argocd/manifests/paperless-ringtail/external-secret.yaml b/argocd/manifests/paperless/external-secret.yaml similarity index 100% rename from argocd/manifests/paperless-ringtail/external-secret.yaml rename to argocd/manifests/paperless/external-secret.yaml diff --git a/argocd/manifests/paperless-ringtail/ingress-tailscale.yaml b/argocd/manifests/paperless/ingress-tailscale.yaml similarity index 100% rename from argocd/manifests/paperless-ringtail/ingress-tailscale.yaml rename to argocd/manifests/paperless/ingress-tailscale.yaml diff --git a/argocd/manifests/paperless-ringtail/kustomization.yaml b/argocd/manifests/paperless/kustomization.yaml similarity index 62% rename from argocd/manifests/paperless-ringtail/kustomization.yaml rename to argocd/manifests/paperless/kustomization.yaml index 41665b8..9c6a086 100644 --- a/argocd/manifests/paperless-ringtail/kustomization.yaml +++ b/argocd/manifests/paperless/kustomization.yaml @@ -13,9 +13,7 @@ resources: images: - name: registry.ops.eblu.me/blumeops/paperless - newTag: v2.20.15-fcac8e5-nix - # amd64 valkey built via nix (the v8.1.7-ecded30 tag without -nix is the - # arm64 Alpine build for indri and fails on ringtail with exec format error) + newTag: v2.20.13-07f52e9 - name: docker.io/library/redis newName: registry.ops.eblu.me/blumeops/valkey - newTag: v8.1.7-ecded30-nix + newTag: v8.1.6-r0-fabca04 diff --git a/argocd/manifests/paperless/pv-nfs.yaml b/argocd/manifests/paperless/pv-nfs.yaml new file mode 100644 index 0000000..8ee7526 --- /dev/null +++ b/argocd/manifests/paperless/pv-nfs.yaml @@ -0,0 +1,22 @@ +# NFS PersistentVolume for Paperless document library +# Requires: NFS share on sifaka at /volume1/paperless with NFS permissions for indri +# +# To create on Synology: +# 1. Control Panel > Shared Folder > Create +# 2. Name: paperless, Location: Volume 1 +# 3. Control Panel > File Services > NFS > NFS Rules +# 4. Add rule for "paperless" share: Hostname=indri, Privilege=Read/Write, Squash=No mapping +apiVersion: v1 +kind: PersistentVolume +metadata: + name: paperless-media-nfs-pv +spec: + capacity: + storage: 500Gi + accessModes: + - ReadWriteMany + persistentVolumeReclaimPolicy: Retain + storageClassName: "" + nfs: + server: sifaka + path: /volume1/paperless diff --git a/argocd/manifests/paperless-ringtail/pvc.yaml b/argocd/manifests/paperless/pvc.yaml similarity index 55% rename from argocd/manifests/paperless-ringtail/pvc.yaml rename to argocd/manifests/paperless/pvc.yaml index 8b44660..4365c9f 100644 --- a/argocd/manifests/paperless-ringtail/pvc.yaml +++ b/argocd/manifests/paperless/pvc.yaml @@ -1,5 +1,5 @@ -# PersistentVolumeClaim for the Paperless document library on ringtail. -# Binds the NFS PV for sifaka:/volume1/paperless. +# PersistentVolumeClaim for Paperless document library +# Binds to the NFS PV for sifaka:/volume1/paperless apiVersion: v1 kind: PersistentVolumeClaim metadata: @@ -9,7 +9,7 @@ spec: accessModes: - ReadWriteMany storageClassName: "" - volumeName: paperless-media-nfs-pv-ringtail + volumeName: paperless-media-nfs-pv resources: requests: storage: 500Gi diff --git a/argocd/manifests/paperless-ringtail/service.yaml b/argocd/manifests/paperless/service.yaml similarity index 100% rename from argocd/manifests/paperless-ringtail/service.yaml rename to argocd/manifests/paperless/service.yaml diff --git a/argocd/manifests/prowler/mutelist/apiserver.yaml b/argocd/manifests/prowler/mutelist/apiserver.yaml index fd077e8..5a25d4f 100644 --- a/argocd/manifests/prowler/mutelist/apiserver.yaml +++ b/argocd/manifests/prowler/mutelist/apiserver.yaml @@ -6,48 +6,48 @@ Mutelist: "apiserver_always_pull_images_plugin": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Only the operator has cluster access; all images pulled from private zot registry." + Description: "CC: single-user-cluster, local-registry. Only the operator has cluster access; all images pulled from private zot registry." "apiserver_audit_log_maxage_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Alloy/Loki provides pod-level audit trail." + Description: "CC: observability-stack-audit. Alloy/Loki provides pod-level audit trail." "apiserver_audit_log_maxbackup_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Alloy/Loki provides pod-level audit trail." + Description: "CC: observability-stack-audit. Alloy/Loki provides pod-level audit trail." "apiserver_audit_log_maxsize_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Alloy/Loki provides pod-level audit trail." + Description: "CC: observability-stack-audit. Alloy/Loki provides pod-level audit trail." "apiserver_audit_log_path_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Alloy/Loki provides pod-level audit trail." + Description: "CC: observability-stack-audit. Alloy/Loki provides pod-level audit trail." "apiserver_deny_service_external_ips": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "No external IPs routable; cluster only reachable via tailnet." + Description: "CC: tailscale-network-isolation. No external IPs routable; cluster only reachable via tailnet." "apiserver_disable_profiling": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Profiling endpoint unreachable from public internet." + Description: "CC: tailscale-network-isolation. Profiling endpoint unreachable from public internet." "apiserver_encryption_provider_config_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Etcd not network-exposed; only operator has node access." + Description: "CC: tailscale-network-isolation, single-user-cluster. Etcd not network-exposed; only operator has node access." "apiserver_kubelet_cert_auth": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Kubelet API not exposed outside the node; minikube auto-generates certificates." + Description: "CC: tailscale-network-isolation. Kubelet API not exposed outside the node; minikube auto-generates certificates." "apiserver_request_timeout_set": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "API server only reachable via tailnet; DoS risk limited to trusted clients." + Description: "CC: tailscale-network-isolation. API server only reachable via tailnet; DoS risk limited to trusted clients." "apiserver_service_account_lookup_true": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "Only operator manages service accounts; no revoked tokens in circulation." + Description: "CC: single-user-cluster. Only operator manages service accounts; no revoked tokens in circulation." "apiserver_strong_ciphers_only": Regions: ["*"] Resources: ["^kube-apiserver-minikube$"] - Description: "API server traffic encrypted by WireGuard at the network layer." + Description: "CC: tailscale-network-isolation. API server traffic encrypted by WireGuard at the network layer." diff --git a/argocd/manifests/prowler/mutelist/control-plane.yaml b/argocd/manifests/prowler/mutelist/control-plane.yaml index d3cc34a..2056691 100644 --- a/argocd/manifests/prowler/mutelist/control-plane.yaml +++ b/argocd/manifests/prowler/mutelist/control-plane.yaml @@ -6,12 +6,12 @@ Mutelist: "controllermanager_disable_profiling": Regions: ["*"] Resources: ["^kube-controller-manager-minikube$"] - Description: "Profiling endpoint unreachable from public internet." + Description: "CC: tailscale-network-isolation. Profiling endpoint unreachable from public internet." "scheduler_profiling": Regions: ["*"] Resources: ["^kube-scheduler-minikube$"] - Description: "Profiling endpoint unreachable from public internet." + Description: "CC: tailscale-network-isolation. Profiling endpoint unreachable from public internet." "kubelet_tls_cert_and_key": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "Kubelet API not exposed outside node; minikube auto-generates certificates." + Description: "CC: tailscale-network-isolation, single-user-cluster. Kubelet API not exposed outside node; minikube auto-generates certificates." diff --git a/argocd/manifests/prowler/mutelist/core-pod-security.yaml b/argocd/manifests/prowler/mutelist/core-pod-security.yaml index b1e986e..c39e0c6 100644 --- a/argocd/manifests/prowler/mutelist/core-pod-security.yaml +++ b/argocd/manifests/prowler/mutelist/core-pod-security.yaml @@ -17,8 +17,9 @@ Mutelist: - "^kindnet-" - "^storage-provisioner$" Description: >- - Control-plane and networking pods require hostNetwork by design. - Host network itself is only reachable via tailnet. + CC: tailscale-network-isolation. Control-plane and networking + pods require hostNetwork by design. Host network itself is + only reachable via tailnet. "core_minimize_privileged_containers": Regions: ["*"] Resources: @@ -30,6 +31,7 @@ Mutelist: # Forgejo runner - "^forgejo-runner-" Description: >- + CC: single-user-cluster, operator-managed-pods, trusted-ci-only. kube-proxy: system pod, single-user cluster. ts-*/ingress-*: Tailscale operator-managed. forgejo-runner: DinD limited to trusted private forge repos. @@ -47,24 +49,25 @@ Mutelist: - "^nameserver-" - "^ingress-" Description: >- - System pods managed by minikube and Tailscale operator; - seccomp profiles set by upstream. Single-user cluster limits - exploit surface. + CC: single-user-cluster, operator-managed-pods. System pods + managed by minikube and Tailscale operator; seccomp profiles + set by upstream. Single-user cluster limits exploit surface. "core_minimize_hostPID_containers": Regions: ["*"] Resources: - "^prowler-" Description: >- - Prowler CIS scanner requires hostPID for file permission - checks. Runs as CronJob with 7-day TTL, not a persistent - workload. + CC: ephemeral-privileged-jobs. Prowler CIS scanner requires + hostPID for file permission checks. Runs as CronJob with + 7-day TTL, not a persistent workload. "core_minimize_root_containers_admission": Regions: ["*"] Resources: - "^grafana-" Description: >- - Root limited to init-chown-data container; all runtime - containers run as UID 472 with caps dropped. + CC: init-container-isolation. Root limited to init-chown-data + container; all runtime containers run as UID 472 with caps + dropped. "core_minimize_containers_added_capabilities": Regions: ["*"] Resources: @@ -74,9 +77,10 @@ Mutelist: # Grafana init-chown-data - "^grafana-" Description: >- - System pods: capabilities required by function - (minikube-managed). Grafana: CHOWN limited to init phase; - runtime containers drop ALL. + CC: single-user-cluster, init-container-isolation. System + pods: capabilities required by function (minikube-managed). + Grafana: CHOWN limited to init phase; runtime containers + drop ALL. "core_minimize_containers_capabilities_assigned": Regions: ["*"] Resources: @@ -84,4 +88,5 @@ Mutelist: - "^kindnet-" - "^grafana-" Description: >- - See core_minimize_containers_added_capabilities. + CC: single-user-cluster, init-container-isolation. See + core_minimize_containers_added_capabilities. diff --git a/argocd/manifests/prowler/mutelist/manual-node-checks.yaml b/argocd/manifests/prowler/mutelist/manual-node-checks.yaml index c91a2a6..9c8354d 100644 --- a/argocd/manifests/prowler/mutelist/manual-node-checks.yaml +++ b/argocd/manifests/prowler/mutelist/manual-node-checks.yaml @@ -1,7 +1,7 @@ # Node-level and RBAC checks that Prowler reports as MANUAL because it -# cannot evaluate them from inside a pod. Verified out-of-band by the -# node-verification block in `mise run review-compliance-reports`, which -# SSHes into the minikube node and checks each condition directly. +# cannot evaluate them from inside a pod. Compensated by automated +# verification in `mise run review-compliance-reports`, which SSHes into +# the minikube node and checks each condition directly every week. Mutelist: Accounts: "*": @@ -9,51 +9,51 @@ Mutelist: "etcd_unique_ca": Regions: ["*"] Resources: ["^etcd-minikube$"] - Description: "Etcd CA fingerprint verified different from cluster CA by review-compliance-reports." + Description: "CC: node-config-automated-verification. Etcd CA fingerprint verified different from cluster CA by review-compliance-reports." "kubelet_conf_file_ownership": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File ownership verified root:root by review-compliance-reports." + Description: "CC: node-config-automated-verification. File ownership verified root:root by review-compliance-reports." "kubelet_conf_file_permissions": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File permissions verified 600 by review-compliance-reports." + Description: "CC: node-config-automated-verification. File permissions verified 600 by review-compliance-reports." "kubelet_config_yaml_ownership": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File ownership verified root:root by review-compliance-reports." + Description: "CC: node-config-automated-verification. File ownership verified root:root by review-compliance-reports." "kubelet_config_yaml_permissions": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File permissions verified 644 by review-compliance-reports." + Description: "CC: node-config-automated-verification. File permissions verified 644 by review-compliance-reports." "kubelet_service_file_ownership_root": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File ownership verified root:root by review-compliance-reports." + Description: "CC: node-config-automated-verification. File ownership verified root:root by review-compliance-reports." "kubelet_service_file_permissions": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "File permissions verified 644 by review-compliance-reports." + Description: "CC: node-config-automated-verification. File permissions verified 644 by review-compliance-reports." "kubelet_disable_read_only_port": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "readOnlyPort absence (defaults to 0) verified by review-compliance-reports." + Description: "CC: node-config-automated-verification. readOnlyPort absence (defaults to 0) verified by review-compliance-reports." "kubelet_event_record_qps": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "eventRecordQPS absence (defaults to 5) verified by review-compliance-reports." + Description: "CC: node-config-automated-verification. eventRecordQPS absence (defaults to 5) verified by review-compliance-reports." "kubelet_manage_iptables": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "makeIPTablesUtilChains absence (defaults to true) verified by review-compliance-reports." + Description: "CC: node-config-automated-verification. makeIPTablesUtilChains absence (defaults to true) verified by review-compliance-reports." "kubelet_strong_ciphers_only": Regions: ["*"] Resources: ["^kubelet-config$"] - Description: "Go default ciphers used; all traffic WireGuard-encrypted via tailnet." + Description: "CC: node-config-automated-verification, tailscale-network-isolation. Go default ciphers used; all traffic WireGuard-encrypted via tailnet." "rbac_cluster_admin_usage": Regions: ["*"] Resources: - "^cluster-admin$" - "^kubeadm:cluster-admins$" - "^minikube-rbac$" - Description: "Only built-in/minikube cluster-admin bindings present; verified by review-compliance-reports." + Description: "CC: node-config-automated-verification, single-user-cluster. Only built-in/minikube cluster-admin bindings present; verified by review-compliance-reports." diff --git a/argocd/manifests/prowler/mutelist/rbac.yaml b/argocd/manifests/prowler/mutelist/rbac.yaml index 324809d..c9c52e4 100644 --- a/argocd/manifests/prowler/mutelist/rbac.yaml +++ b/argocd/manifests/prowler/mutelist/rbac.yaml @@ -13,8 +13,9 @@ Mutelist: # ArgoCD - "^argocd-" Description: >- - Built-in K8s roles: only operator can bind them. ArgoCD: - requires broad access but is SSO-gated via Authentik OIDC. + CC: single-user-cluster, sso-gated-admin-tools. Built-in + K8s roles: only operator can bind them. ArgoCD: requires + broad access but is SSO-gated via Authentik OIDC. "rbac_minimize_pod_creation_access": Regions: ["*"] Resources: @@ -25,12 +26,14 @@ Mutelist: # CloudNativePG operator - "^cnpg-manager$" Description: >- - Built-in K8s roles and CNPG operator. Only the operator can - assign these roles; no untrusted users have cluster access. + CC: single-user-cluster. Built-in K8s roles and CNPG + operator. Only the operator can assign these roles; no + untrusted users have cluster access. "rbac_minimize_service_account_token_creation": Regions: ["*"] Resources: - "^system:" Description: >- - kube-controller-manager requires token creation for SA - management. Only operator manages service accounts. + CC: single-user-cluster. kube-controller-manager requires + token creation for SA management. Only operator manages + service accounts. diff --git a/argocd/manifests/prowler/mutelist/trivyignore.yaml b/argocd/manifests/prowler/mutelist/trivyignore.yaml index 87af966..22c612a 100644 --- a/argocd/manifests/prowler/mutelist/trivyignore.yaml +++ b/argocd/manifests/prowler/mutelist/trivyignore.yaml @@ -14,24 +14,26 @@ misconfigurations: paths: - "argocd/manifests/external-secrets/rbac.yaml" statement: >- - external-secrets-operator's entire function is to read and - synthesize Secret objects; ClusterRole over secrets is its - purpose. Both the controller and cert-controller are + CC: operator-purpose-bound-rbac. external-secrets-operator's entire + function is to read and synthesize Secret objects; ClusterRole over + secrets is its purpose. Both the controller and cert-controller are upstream-defined. - id: KSV-0041 paths: - "argocd/manifests/kube-state-metrics/rbac.yaml" - "argocd/manifests/kube-state-metrics-ringtail/rbac.yaml" statement: >- - KSM exposes only Secret metadata (name, namespace, type, labels), - never the data field. list/watch on secrets is required for - kube_secret_info / kube_secret_labels metrics. + CC: kube-state-metrics-metadata-only. KSM exposes only Secret + metadata (name, namespace, type, labels), never the data field. + list/watch on secrets is required for kube_secret_info / + kube_secret_labels metrics. - id: KSV-0114 paths: - "argocd/manifests/external-secrets/rbac.yaml" statement: >- - cert-controller manages the external-secrets validating webhook - configurations to inject its own rotating CA bundle. RBAC is - scoped to two named webhooks (secretstore-validate, - externalsecret-validate) via resourceNames; KSV-0114 doesn't see - the resourceNames restriction so reports the full ClusterRole. + CC: operator-purpose-bound-rbac. cert-controller manages the + external-secrets validating webhook configurations to inject its + own rotating CA bundle. RBAC is scoped to two named webhooks + (secretstore-validate, externalsecret-validate) via resourceNames; + KSV-0114 doesn't see the resourceNames restriction so reports the + full ClusterRole. diff --git a/argocd/manifests/shower/kustomization.yaml b/argocd/manifests/shower/kustomization.yaml index 1c29224..0afc8e3 100644 --- a/argocd/manifests/shower/kustomization.yaml +++ b/argocd/manifests/shower/kustomization.yaml @@ -14,4 +14,4 @@ resources: images: - name: registry.ops.eblu.me/blumeops/shower - newTag: v1.1.3-3645098-nix + newTag: v1.0.2-039d9b9-nix diff --git a/argocd/manifests/tailscale-operator-base/kustomization.yaml b/argocd/manifests/tailscale-operator-base/kustomization.yaml index 9d117ef..4519af6 100644 --- a/argocd/manifests/tailscale-operator-base/kustomization.yaml +++ b/argocd/manifests/tailscale-operator-base/kustomization.yaml @@ -6,11 +6,8 @@ namespace: tailscale # Upstream Tailscale operator manifest from forge mirror. # To upgrade: update the ref in the URL AND the newTag below. -# Must use the tailnet host forge.ops.eblu.me — the public forge.eblu.me -# black-holes /mirrors/ at the Fly edge (AI-scraper mitigation), which the -# in-cluster ArgoCD repo-server would otherwise hit and fail with a 403. resources: - - https://forge.ops.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml + - https://forge.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml - proxyclass.yaml - dnsconfig.yaml diff --git a/argocd/manifests/teslamate/README.md b/argocd/manifests/teslamate/README.md new file mode 100644 index 0000000..7e1f9fc --- /dev/null +++ b/argocd/manifests/teslamate/README.md @@ -0,0 +1,69 @@ +# TeslaMate + +TeslaMate is a self-hosted Tesla data logger that collects and visualizes vehicle data. + +## Prerequisites + +### 1. Create 1Password Secrets + +Create two items in the blumeops 1Password vault: + +1. **TeslaMate DB Password** + - Generate a secure password for the teslamate PostgreSQL user + - Add a field named `password` with the generated value + +2. **TeslaMate Encryption Key** + - Generate with: `openssl rand -base64 32` + - Add a field named `key` with the generated value + - This encrypts Tesla API tokens at rest in the database + +### 2. Apply Kubernetes Secrets + +```bash +# Create namespace +kubectl create namespace teslamate + +# Apply database user secret (for CNPG) +op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl | kubectl apply -f - + +# Apply teslamate secrets +op inject -i argocd/manifests/teslamate/secret-encryption-key.yaml.tpl | kubectl apply -f - +op inject -i argocd/manifests/teslamate/secret-db.yaml.tpl | kubectl apply -f - +``` + +### 3. Create Database + +After the teslamate user exists in PostgreSQL (sync blumeops-pg first): + +```bash +PGPASSWORD=$(op read "op://blumeops/postgres/password") \ + psql -h pg.ops.eblu.me -U eblume -c "CREATE DATABASE teslamate OWNER teslamate;" +``` + +## Deployment + +```bash +# Sync ArgoCD apps +argocd app sync apps +argocd app sync blumeops-pg teslamate grafana grafana-config +``` + +## Tesla API Setup + +1. Access TeslaMate UI at https://tesla.tail8d86e.ts.net +2. Click "Sign in with Tesla" +3. Complete OAuth flow in browser +4. Tokens are encrypted and stored in database +5. Verify vehicle appears and data collection starts + +## Grafana Dashboards + +TeslaMate dashboards are available in Grafana at https://grafana.tail8d86e.ts.net + +They use the "TeslaMate" PostgreSQL datasource (not Prometheus). + +## Notes + +- MQTT is disabled (can be enabled later for Home Assistant integration) +- Timezone is set to America/Los_Angeles +- Encryption key protects Tesla API tokens at rest diff --git a/argocd/manifests/teslamate-ringtail/deployment.yaml b/argocd/manifests/teslamate/deployment.yaml similarity index 81% rename from argocd/manifests/teslamate-ringtail/deployment.yaml rename to argocd/manifests/teslamate/deployment.yaml index cf8cc73..42859a7 100644 --- a/argocd/manifests/teslamate-ringtail/deployment.yaml +++ b/argocd/manifests/teslamate/deployment.yaml @@ -1,10 +1,3 @@ -# TeslaMate on ringtail k3s — Nix image. -# -# The Nix image's Entrypoint waits for postgres, runs migrations -# (TeslaMate.Release.migrate), then starts the release — so no command -# override is needed. Stateless; all data lives in the teslamate database -# on the ringtail blumeops-pg (DATABASE_HOST already an in-cluster name, -# unchanged from minikube). See [[migrate-wave1-ringtail]]. apiVersion: apps/v1 kind: Deployment metadata: diff --git a/argocd/manifests/teslamate-ringtail/external-secret-db.yaml b/argocd/manifests/teslamate/external-secret-db.yaml similarity index 100% rename from argocd/manifests/teslamate-ringtail/external-secret-db.yaml rename to argocd/manifests/teslamate/external-secret-db.yaml diff --git a/argocd/manifests/teslamate-ringtail/external-secret-encryption-key.yaml b/argocd/manifests/teslamate/external-secret-encryption-key.yaml similarity index 100% rename from argocd/manifests/teslamate-ringtail/external-secret-encryption-key.yaml rename to argocd/manifests/teslamate/external-secret-encryption-key.yaml diff --git a/argocd/manifests/teslamate-ringtail/ingress-tailscale.yaml b/argocd/manifests/teslamate/ingress-tailscale.yaml similarity index 100% rename from argocd/manifests/teslamate-ringtail/ingress-tailscale.yaml rename to argocd/manifests/teslamate/ingress-tailscale.yaml diff --git a/argocd/manifests/teslamate-ringtail/kustomization.yaml b/argocd/manifests/teslamate/kustomization.yaml similarity index 90% rename from argocd/manifests/teslamate-ringtail/kustomization.yaml rename to argocd/manifests/teslamate/kustomization.yaml index acb623e..a00586f 100644 --- a/argocd/manifests/teslamate-ringtail/kustomization.yaml +++ b/argocd/manifests/teslamate/kustomization.yaml @@ -12,4 +12,4 @@ resources: images: - name: registry.ops.eblu.me/blumeops/teslamate - newTag: v3.0.0-fcac8e5-nix + newTag: v3.0.0-08c698e diff --git a/argocd/manifests/teslamate-ringtail/service.yaml b/argocd/manifests/teslamate/service.yaml similarity index 100% rename from argocd/manifests/teslamate-ringtail/service.yaml rename to argocd/manifests/teslamate/service.yaml diff --git a/argocd/manifests/unpoller/kustomization.yaml b/argocd/manifests/unpoller/kustomization.yaml index bf776bb..5b7a9e2 100644 --- a/argocd/manifests/unpoller/kustomization.yaml +++ b/argocd/manifests/unpoller/kustomization.yaml @@ -10,7 +10,7 @@ resources: images: - name: registry.ops.eblu.me/blumeops/unpoller - newTag: v3.2.0-4d1f4af + newTag: v2.34.0-613f05d configMapGenerator: - name: unpoller-config diff --git a/compensating-controls.yaml b/compensating-controls.yaml new file mode 100644 index 0000000..658c99d --- /dev/null +++ b/compensating-controls.yaml @@ -0,0 +1,206 @@ +# Compensating Controls +# +# Documents controls that mitigate risks from suppressed or accepted security +# findings. Referenced by security tools (Prowler mutelist, Kingfisher config, +# etc.) via "CC: " in finding descriptions or suppression notes. +# +# Used by `mise run review-compensating-controls` to surface stale controls. +# +# Fields: +# id - kebab-case unique identifier, referenced from tool configs +# description - what the control actually does to mitigate risk +# created - date (YYYY-MM-DD) the control was documented +# last-reviewed - date (YYYY-MM-DD) or null +# notes - optional context + +controls: + - id: single-user-cluster + description: >- + Only the cluster operator (eblume) has kubectl access. No untrusted + users can create pods, access cached images, or bind RBAC roles. + created: 2026-03-30 + last-reviewed: 2026-04-01 + notes: >- + Verify by checking kubeconfig distribution and Tailscale ACLs. + If additional users gain cluster access, re-evaluate all findings + muted under this control. + + - id: tailscale-network-isolation + description: >- + Cluster is not internet-exposed. All access requires Tailscale + identity with ACL enforcement. Profiling endpoints, debug ports, + and control-plane APIs are unreachable from the public internet. + created: 2026-03-30 + last-reviewed: 2026-04-06 + notes: >- + Verify with 'tailscale serve status --json' on indri and review + Tailscale ACLs in pulumi/tailscale/. Only tag:flyio-target services + are publicly routable. + + - id: local-registry + description: >- + Operator-built services use a private zot registry + (registry.ops.eblu.me) for supply-chain control. Remaining + images are pulled from public registries without stored + credentials. No shared registry secrets are cached on cluster + nodes. + created: 2026-03-30 + last-reviewed: 2026-04-12 + notes: >- + Verify by checking image prefixes in kustomization.yaml files. + Known external-image categories: (1) upstream apps not yet + mirrored — immich, ollama, frigate, frigate-notify, valkey; + (2) infrastructure components — tailscale operator/proxy, + external-secrets, 1password-connect, forgejo-runner, docker + DinD, nvidia-device-plugin; (3) utility base images — busybox, + alpine (grafana init containers). Track upstream versions in + service-versions.yaml. Goal is to progressively mirror these + into zot. + + - id: sso-gated-admin-tools + description: >- + ArgoCD requires SSO authentication via Authentik OIDC. Wildcard + RBAC roles are mitigated by requiring authenticated identity + before any API access. + created: 2026-03-30 + last-reviewed: 2026-04-14 + notes: >- + Verify Authentik OIDC provider config for ArgoCD and that + anonymous access is disabled. Check ArgoCD --auth-token isn't + leaked. The workflow-bot API key account is scoped to sync/get + only. + + - id: operator-managed-pods + description: >- + Tailscale operator manages proxy pod specs (ts-*, ingress-*, + operator-*, nameserver-*). Pod security settings are set by the + operator, not user manifests. Operator is tracked in + service-versions.yaml and regularly updated. + created: 2026-03-30 + last-reviewed: 2026-04-21 + notes: >- + Verify operator version is current via 'mise run service-review'. + Check Tailscale changelog for security fixes. If operator adds + seccomp support, remove these mutes. As of 2026-04-21: still no + default seccomp on operator-generated pods (upstream issue #7359 + open). A ProxyClass + generic device plugin can downgrade proxies + from privileged to NET_ADMIN+NET_RAW and set seccompProfile — + potential future remediation to remove the seccomp mute without + waiting for upstream defaults. + + - id: ephemeral-privileged-jobs + description: >- + Prowler CIS scanner runs as a CronJob with 7-day TTL + auto-deletion, not as a persistent privileged workload. hostPID + exposure is time-bounded to scan duration (~20s). + created: 2026-03-30 + last-reviewed: 2026-04-29 + notes: >- + Verify TTL is set in cronjob.yaml. Check that no persistent + pods run with hostPID on the scanned cluster (indri). The + alloy-tracing DaemonSet on ringtail also uses hostPID but is + out of scope — Prowler only scans indri. Tracked in Todoist: + "prowler scan against ringtail" — once that lands, the + DaemonSet's hostPID+privileged posture will surface as a CIS + finding and need its own CC or remediation. + + - id: trusted-ci-only + description: >- + Forgejo runner only executes workflows from repos on the private + forge (forge.ops.eblu.me). No external or untrusted repos can + trigger privileged CI jobs. + created: 2026-03-30 + last-reviewed: 2026-05-01 + notes: >- + Verification: (1) Runner config (argocd/manifests/forgejo-runner/ + config.yaml) connects only to https://forge.ops.eblu.me/. (2) Forge + app.ini has DISABLE_REGISTRATION=true and ALLOW_ONLY_EXTERNAL_REGISTRATION + =true (ansible/roles/forgejo/defaults/main.yml) — no untrusted users + can sign up or create repos. The runner registers at instance scope + (repo_id=0/owner_id=0 in action_runner table), but the instance itself + is closed, so no per-repo allow-list is needed. Re-evaluate if the + forge ever opens to additional users or if the runner is repointed + to an external forge. + + - id: init-container-isolation + description: >- + Root privileges and added capabilities (CHOWN) are limited to + init containers that run once at pod startup. All runtime + containers run as non-root (UID 472) with all capabilities + dropped. + created: 2026-03-30 + last-reviewed: 2026-05-04 + notes: >- + Verify by inspecting grafana deployment.yaml securityContext + for both init and runtime containers. If fsGroup alone can + handle PVC ownership, remove init-chown-data and this control. + Retirement deferred until grafana lands on ringtail's k3s + (see [[indri-k8s-migration]]) — storage backend will change, + and removing init-chown-data right before that migration + trades a real safety net for marginal cleanup. Revisit + post-migration. + + - id: node-config-automated-verification + description: >- + Prowler reports certain node-level checks as MANUAL because it runs + inside a pod and cannot evaluate kubelet file permissions, kubelet + config arguments, etcd CA separation, or cluster-admin RBAC bindings. + The review-compliance-reports script SSHes into the minikube node + weekly and programmatically verifies each condition, failing loudly + if any check deviates from expected values. + created: 2026-04-14 + last-reviewed: 2026-04-14 + notes: >- + Verification runs as part of 'mise run review-compliance-reports'. + If minikube node is unreachable, all checks report as FAIL. If new + MANUAL findings appear in Prowler, add corresponding verification + logic to the script and update the mutelist. + + - id: operator-purpose-bound-rbac + description: >- + Operators whose entire function is to manage a sensitive resource + legitimately need RBAC over that resource. external-secrets-operator + manages Secret objects (its purpose) and the cert-controller mutates + its own ValidatingWebhookConfigurations to inject rotating CA bundles. + Risk is bounded by: (1) the operator code being upstream open-source + and reviewed; (2) RBAC scoped to specific named webhooks where + possible; (3) supply chain controls on the operator image (mirrored + to local registry, version tracked in service-versions.yaml). + created: 2026-04-27 + last-reviewed: 2026-04-27 + notes: >- + Verify by checking that the operators in question still match their + stated purpose (i.e. external-secrets is still the only consumer of + these ClusterRoles) and that upstream hasn't published advisories + for credential-handling bugs. Re-evaluate if a non-secrets-managing + ClusterRole appears under this control. + + - id: kube-state-metrics-metadata-only + description: >- + kube-state-metrics holds list/watch on Secrets cluster-wide but only + exposes Secret object *metadata* (name, namespace, type, creation + timestamp, labels) via the kube_secret_info / kube_secret_labels + metrics. Secret data fields are never read into KSM's exposed + metrics by upstream design. Mitigation rests on KSM's metric + schema, the version pin in service-versions.yaml, and the metrics + endpoint being reachable only on the cluster network. + created: 2026-04-27 + last-reviewed: 2026-04-27 + notes: >- + Verify by inspecting the /metrics endpoint output for any series + that include secret data (only *_info and *_labels metrics should + reference secrets, and labels should be limited to user-applied + labels — never the data:). Re-evaluate on KSM version bumps. + + - id: observability-stack-audit + description: >- + Alloy collects pod logs and ships them to Loki, providing an + audit trail for cluster activity. Compensates for missing + apiserver audit logging which minikube does not configure. + created: 2026-03-30 + last-reviewed: 2026-03-30 + notes: >- + Verify Alloy DaemonSet is running and Loki is receiving logs. + Note this is weaker than native apiserver audit logs — it + captures pod stdout/stderr, not API request-level auditing. + Consider enabling minikube audit logging if supported. diff --git a/containers/external-secrets/container.py b/containers/external-secrets/container.py deleted file mode 100644 index 6be5765..0000000 --- a/containers/external-secrets/container.py +++ /dev/null @@ -1,51 +0,0 @@ -"""External Secrets Operator — native Dagger build. - -Two-stage build: Go binary (all providers), Alpine runtime. -Source cloned from forge mirror. - -A single binary serves as the controller, webhook, and cert-controller; the -Deployments select the role via a subcommand passed in `args:`, so the image -ENTRYPOINT must be the binary itself (matching upstream's distroless image). -""" - -import dagger - -from blumeops.containers import ( - alpine_runtime, - clone_from_forge, - go_build, - oci_labels, -) - -VERSION = "v2.2.0" - - -async def build(src: dagger.Directory) -> dagger.Container: - source = clone_from_forge("external-secrets", VERSION) - - # Upstream `make build` compiles every secret provider into a single - # static binary (`-tags all_providers`, CGO disabled). Mirror that so the - # local image is functionally identical to ghcr.io/.../external-secrets. - backend = go_build( - source, - "/external-secrets", - tags="all_providers", - ) - - runtime = alpine_runtime( - extra_apk=["ca-certificates"], - create_user=False, - ) - runtime = oci_labels( - runtime, - title="External Secrets Operator", - description=( - "Kubernetes operator that integrates external secret management systems" - ), - version=VERSION, - ) - return ( - runtime.with_file("/bin/external-secrets", backend.file("/external-secrets")) - .with_user("65534") - .with_entrypoint(["/bin/external-secrets"]) - ) diff --git a/containers/external-secrets/default.nix b/containers/external-secrets/default.nix deleted file mode 100644 index eabe03d..0000000 --- a/containers/external-secrets/default.nix +++ /dev/null @@ -1,56 +0,0 @@ -# Nix-built External Secrets Operator (amd64, for ringtail k3s). -# Builds v2.2.0 from the forge mirror with all secret providers compiled in, -# faithful to upstream's `make build` (-tags all_providers). The container.py -# sibling builds the arm64 image for indri's minikube; this default.nix builds -# the amd64 image on ringtail's nix-container-builder. -{ pkgs ? import { } }: - -let - version = "2.2.0"; - - src = pkgs.fetchgit { - url = "https://forge.ops.eblu.me/mirrors/external-secrets.git"; - rev = "v${version}"; - hash = "sha256-eAocOAp5s4CFRrpKfQr2lf3Ji+6nQQ1A5/eTw5B7v9U="; - }; - - # external-secrets v2.2.0 requires Go >= 1.26.1; nixpkgs default go is 1.25.x. - external-secrets = (pkgs.buildGoModule.override { go = pkgs.go_1_26; }) { - inherit src version; - pname = "external-secrets"; - vendorHash = "sha256-0xuBK3fjAplPLAElHvKB6d+2lDz+De/s91fV4dPZwjE="; - - doCheck = false; - - subPackages = [ "." ]; - - tags = [ "all_providers" ]; - - ldflags = [ "-s" "-w" ]; - - meta = with pkgs.lib; { - description = "Kubernetes operator that integrates external secret management systems"; - homepage = "https://github.com/external-secrets/external-secrets"; - license = licenses.asl20; - mainProgram = "external-secrets"; - }; - }; -in - -pkgs.dockerTools.buildLayeredImage { - name = "blumeops/external-secrets"; - contents = [ - external-secrets - pkgs.cacert - pkgs.tzdata - ]; - - config = { - Entrypoint = [ "${external-secrets}/bin/external-secrets" ]; - Env = [ - "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" - "TZDIR=${pkgs.tzdata}/share/zoneinfo" - ]; - User = "65534"; - }; -} diff --git a/containers/mealie/Dockerfile b/containers/mealie/Dockerfile new file mode 100644 index 0000000..8df38bf --- /dev/null +++ b/containers/mealie/Dockerfile @@ -0,0 +1,145 @@ +# Mealie — self-hosted recipe manager +# Built from source via forge mirror of mealie-recipes/mealie +# Based on upstream docker/Dockerfile (multi-stage: Node frontend + Python backend) + +ARG CONTAINER_APP_VERSION=v3.12.0 + +############################################### +# Frontend Build +############################################### +FROM node:24-slim AS frontend-builder + +ARG CONTAINER_APP_VERSION +RUN apt-get update && apt-get install --no-install-recommends -y git ca-certificates && rm -rf /var/lib/apt/lists/* + +RUN git clone --depth 1 --branch ${CONTAINER_APP_VERSION} \ + https://forge.ops.eblu.me/mirrors/mealie.git /src + +WORKDIR /src/frontend + +RUN yarn install \ + --prefer-offline \ + --frozen-lockfile \ + --non-interactive \ + --production=false \ + --network-timeout 1000000 + +RUN yarn generate + +############################################### +# Python Base +############################################### +FROM python:3.12-slim AS python-base + +ENV MEALIE_HOME="/app" +ENV PYTHONUNBUFFERED=1 \ + PYTHONDONTWRITEBYTECODE=1 \ + PIP_NO_CACHE_DIR=off \ + PIP_DISABLE_PIP_VERSION_CHECK=on \ + PIP_DEFAULT_TIMEOUT=100 \ + VENV_PATH="/opt/mealie" + +ENV PATH="$VENV_PATH/bin:$PATH" + +RUN useradd -u 911 -U -d $MEALIE_HOME -s /bin/bash abc \ + && usermod -G users abc \ + && mkdir $MEALIE_HOME + +############################################### +# Backend Package Build +############################################### +FROM python-base AS backend-builder + +ARG CONTAINER_APP_VERSION +RUN apt-get update \ + && apt-get install --no-install-recommends -y curl git ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +RUN pip install uv + +RUN git clone --depth 1 --branch ${CONTAINER_APP_VERSION} \ + https://forge.ops.eblu.me/mirrors/mealie.git /src + +WORKDIR /src + +COPY --from=frontend-builder /src/frontend/dist ./mealie/frontend + +RUN uv build --out-dir dist + +RUN uv export --no-editable --no-emit-project --extra pgsql --format requirements-txt --output-file dist/requirements.txt \ + && MEALIE_VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])") \ + && echo "mealie[pgsql]==${MEALIE_VERSION} \\" >> dist/requirements.txt \ + && pip hash dist/mealie-${MEALIE_VERSION}-py3-none-any.whl | tail -n1 | tr -d '\n' >> dist/requirements.txt \ + && echo " \\" >> dist/requirements.txt \ + && pip hash dist/mealie-${MEALIE_VERSION}.tar.gz | tail -n1 >> dist/requirements.txt + +############################################### +# Python Venv Build +############################################### +FROM python-base AS venv-builder + +RUN apt-get update \ + && apt-get install --no-install-recommends -y \ + build-essential \ + libpq-dev \ + libwebp-dev \ + ffmpeg \ + libsasl2-dev libldap2-dev libssl-dev \ + gnupg gnupg2 gnupg1 \ + && rm -rf /var/lib/apt/lists/* + +RUN python3 -m venv --upgrade-deps $VENV_PATH + +COPY --from=backend-builder /src/dist /dist + +RUN . $VENV_PATH/bin/activate \ + && pip install --require-hashes -r /dist/requirements.txt --find-links /dist + +############################################### +# Production Image +############################################### +FROM python-base AS production + +ENV PRODUCTION=true +ENV TESTING=false + +RUN apt-get update \ + && apt-get install --no-install-recommends -y \ + curl \ + ffmpeg \ + gosu \ + iproute2 \ + libldap-common \ + libldap2 \ + && rm -rf /var/lib/apt/lists/* + +RUN mkdir -p /run/secrets + +COPY --from=venv-builder $VENV_PATH $VENV_PATH + +ENV NLTK_DATA="/nltk_data/" +RUN mkdir -p $NLTK_DATA +RUN python -m nltk.downloader -d $NLTK_DATA averaged_perceptron_tagger_eng + +VOLUME ["$MEALIE_HOME/data/"] +ENV APP_PORT=9000 + +EXPOSE ${APP_PORT} + +COPY --from=backend-builder /src/docker/healthcheck.sh $MEALIE_HOME/healthcheck.sh +RUN chmod +x $MEALIE_HOME/healthcheck.sh +HEALTHCHECK CMD $MEALIE_HOME/healthcheck.sh + +ENV HOST=0.0.0.0 + +COPY --from=backend-builder /src/docker/entry.sh $MEALIE_HOME/run.sh +RUN chmod +x $MEALIE_HOME/run.sh + +ARG CONTAINER_APP_VERSION +LABEL org.opencontainers.image.title="Mealie" +LABEL org.opencontainers.image.description="Self-hosted recipe manager" +LABEL org.opencontainers.image.version="${CONTAINER_APP_VERSION}" +LABEL org.opencontainers.image.source="https://forge.eblu.me/eblume/blumeops" +LABEL org.opencontainers.image.vendor="blumeops" + +ENTRYPOINT ["/app/run.sh"] diff --git a/containers/mealie/default.nix b/containers/mealie/default.nix deleted file mode 100644 index e55efe3..0000000 --- a/containers/mealie/default.nix +++ /dev/null @@ -1,69 +0,0 @@ -# Nix-built Mealie for ringtail (amd64). -# -# Replaces the from-source Dockerfile build (Node frontend + Python venv) -# with nixpkgs' mealie, which ships a single `mealie` gunicorn entrypoint -# serving the prebuilt frontend + backend — so this is a clean single- -# process wrap (unlike paperless, which is multi-process). -# -# Mealie stores its DB as SQLite under DATA_DIR (the mealie-data PVC at -# /app/data); there is no postgres. The run wrapper mirrors the nixpkgs -# mealie NixOS module: run `libexec/init_db` (Alembic migrations) first, -# then exec gunicorn. -# -# Self-pins nixos-unstable: stable nixpkgs lags at 3.9.2, unstable carries -# 3.16.0. This is a forward 4-minor bump from the v3.12.0 Dockerfile build -# (the deferred upgrade) — mealie auto-migrates the SQLite DB forward on -# startup via init_db; the source PVC is retained for rollback. The version -# assertion makes nix-build fail if a pin bump changes the version. -let - nixpkgs = fetchTarball { - url = "https://github.com/NixOS/nixpkgs/archive/331800de5053fcebacf6813adb5db9c9dca22a0c.tar.gz"; - sha256 = "1p54fm6dkbq62kpi55cr4wyx7b1nsajpsnjgs64cmp073fwi15f7"; - }; - pkgs = import nixpkgs { system = "x86_64-linux"; }; - - version = "3.16.0"; - - app = pkgs.mealie; - - # Mirror the NixOS module's mealie service: init_db (Alembic) then - # gunicorn bound to the app port. DATA_DIR/env come from the image + - # k8s manifest. - mealie-run = pkgs.writeShellScriptBin "mealie-run" '' - set -e - ${app}/libexec/init_db - exec ${pkgs.lib.getExe app} -b 0.0.0.0:9000 - ''; -in - -assert app.version == version; - -pkgs.dockerTools.buildLayeredImage { - name = "blumeops/mealie"; - - contents = [ - app - mealie-run - pkgs.bashInteractive - pkgs.coreutils - pkgs.cacert - pkgs.tzdata - # python3 (stdlib sqlite3) for the borgmatic k8s-sqlite-dump helper, - # which runs `python3 -c "...sqlite3...backup..."` inside the pod. - # Same nixpkgs python mealie is built against, so ~no added closure. - pkgs.python3 - ]; - - config = { - Cmd = [ "${mealie-run}/bin/mealie-run" ]; - Env = [ - "DATA_DIR=/app/data" - "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" - "PYTHONUNBUFFERED=1" - "PRODUCTION=true" - ]; - ExposedPorts = { - "9000/tcp" = { }; - }; - }; -} diff --git a/containers/paperless/Dockerfile b/containers/paperless/Dockerfile new file mode 100644 index 0000000..a7b4e65 --- /dev/null +++ b/containers/paperless/Dockerfile @@ -0,0 +1,156 @@ +# syntax=docker/dockerfile:1 +# Paperless-ngx — self-hosted document management +# Built from source via forge mirror of paperless-ngx/paperless-ngx +# Closely follows upstream Dockerfile structure with git clone instead of COPY + +ARG CONTAINER_APP_VERSION=v2.20.13 + +############################################### +# Stage 1: Clone source (reused by later stages) +############################################### +FROM docker.io/library/alpine:3.22 AS source + +ARG CONTAINER_APP_VERSION +RUN apk add --no-cache git +RUN git clone --depth 1 --branch ${CONTAINER_APP_VERSION} \ + https://forge.ops.eblu.me/mirrors/paperless-ngx.git /src + +############################################### +# Stage 2: Compile frontend +############################################### +FROM --platform=$BUILDPLATFORM docker.io/node:20-trixie-slim AS compile-frontend + +COPY --from=source /src/src-ui /src/src-ui +WORKDIR /src/src-ui + +RUN set -eux \ + && npm update -g pnpm \ + && npm install -g corepack@latest \ + && corepack enable \ + && pnpm install + +RUN set -eux \ + && ./node_modules/.bin/ng build --configuration production + +############################################### +# Stage 3: s6-overlay base +############################################### +FROM ghcr.io/astral-sh/uv:0.9.15-python3.12-trixie-slim AS s6-overlay-base + +WORKDIR /usr/src/s6 + +ENV S6_BEHAVIOUR_IF_STAGE2_FAILS=2 \ + S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0 \ + S6_VERBOSITY=1 \ + PATH=/command:$PATH + +ARG TARGETARCH +ARG TARGETVARIANT +ARG S6_OVERLAY_VERSION=3.2.1.0 + +RUN set -eux \ + && apt-get update \ + && apt-get install --yes --quiet --no-install-recommends curl xz-utils \ + && S6_ARCH="" \ + && if [ "${TARGETARCH}${TARGETVARIANT}" = "amd64" ]; then S6_ARCH="x86_64"; \ + elif [ "${TARGETARCH}${TARGETVARIANT}" = "arm64" ]; then S6_ARCH="aarch64"; fi \ + && if [ -z "${S6_ARCH}" ]; then echo "Error: Cannot determine arch"; exit 1; fi \ + && curl --fail --silent --show-error --location --remote-name-all --parallel \ + "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz" \ + "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz.sha256" \ + "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${S6_ARCH}.tar.xz" \ + "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${S6_ARCH}.tar.xz.sha256" \ + && sha256sum --check ./*.sha256 \ + && tar --directory / -Jxpf s6-overlay-noarch.tar.xz \ + && tar --directory / -Jxpf s6-overlay-${S6_ARCH}.tar.xz \ + && rm ./*.tar.xz ./*.sha256 \ + && apt-get --yes purge curl xz-utils \ + && apt-get --yes autoremove --purge \ + && rm -rf /var/lib/apt/lists/* + +# Copy rootfs (s6 service definitions, init scripts) +COPY --from=source /src/docker/rootfs / + +############################################### +# Stage 4: Main application +############################################### +FROM s6-overlay-base AS main-app + +ARG CONTAINER_APP_VERSION +ARG DEBIAN_FRONTEND=noninteractive +ARG TARGETARCH +ARG JBIG2ENC_VERSION=0.30 + +ENV PYTHONDONTWRITEBYTECODE=1 \ + PYTHONUNBUFFERED=1 \ + PYTHONWARNINGS="ignore:::django.http.response:517" \ + PNGX_CONTAINERIZED=1 \ + UV_LINK_MODE=copy \ + UV_CACHE_DIR=/cache/uv/ + +# Runtime packages +RUN set -eux \ + && apt-get update \ + && apt-get install --yes --quiet --no-install-recommends \ + curl gosu tzdata fonts-liberation gettext ghostscript gnupg \ + icc-profiles-free imagemagick postgresql-client \ + tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \ + tesseract-ocr-ita tesseract-ocr-spa unpaper pngquant jbig2dec \ + libxml2 libxslt1.1 qpdf file libmagic1 media-types zlib1g \ + libzbar0 poppler-utils \ + && curl --fail --silent --show-error --location --remote-name-all \ + "https://github.com/paperless-ngx/builder/releases/download/jbig2enc-trixie-v${JBIG2ENC_VERSION}/jbig2enc_${JBIG2ENC_VERSION}-1_${TARGETARCH}.deb" \ + && dpkg --install ./jbig2enc_${JBIG2ENC_VERSION}-1_${TARGETARCH}.deb \ + && cp /etc/ImageMagick-6/paperless-policy.xml /etc/ImageMagick-6/policy.xml \ + && rm --force *.deb \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /usr/src/paperless/src/ + +# Python dependencies +COPY --from=source /src/pyproject.toml /src/uv.lock /usr/src/paperless/src/ + +RUN --mount=type=cache,target=${UV_CACHE_DIR},id=python-cache \ + set -eux \ + && apt-get update \ + && apt-get install --yes --quiet --no-install-recommends \ + build-essential default-libmysqlclient-dev pkg-config \ + && uv export --quiet --no-dev --all-extras --format requirements-txt --output-file requirements.txt \ + && uv pip install --system --no-python-downloads --python-preference system --requirements requirements.txt \ + && python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" snowball_data \ + && python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" stopwords \ + && python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" punkt_tab \ + && apt-get --yes purge build-essential default-libmysqlclient-dev pkg-config \ + && apt-get --yes autoremove --purge \ + && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* + +# Copy backend source +COPY --from=source /src/src ./ + +# Copy compiled frontend +COPY --from=compile-frontend /src/src/documents/static/frontend/ ./documents/static/frontend/ + +# Create user and finalize +RUN set -eux \ + && addgroup --gid 1000 paperless \ + && useradd --uid 1000 --gid paperless --home-dir /usr/src/paperless paperless \ + && mkdir -p /usr/src/paperless/data /usr/src/paperless/media \ + /usr/src/paperless/consume /usr/src/paperless/export \ + && chown -R paperless:paperless /usr/src/paperless \ + && s6-setuidgid paperless python3 manage.py collectstatic --clear --no-input --link \ + && s6-setuidgid paperless python3 manage.py compilemessages + +VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", \ + "/usr/src/paperless/consume", "/usr/src/paperless/export"] + +ENTRYPOINT ["/init"] +EXPOSE 8000 + +HEALTHCHECK --interval=30s --timeout=10s --retries=5 \ + CMD [ "curl", "-fs", "-S", "-L", "--max-time", "2", "http://localhost:8000" ] + +LABEL org.opencontainers.image.title="Paperless-ngx" +LABEL org.opencontainers.image.description="Self-hosted document management system" +LABEL org.opencontainers.image.version="${CONTAINER_APP_VERSION}" +LABEL org.opencontainers.image.source="https://forge.eblu.me/eblume/blumeops" +LABEL org.opencontainers.image.vendor="blumeops" diff --git a/containers/paperless/default.nix b/containers/paperless/default.nix deleted file mode 100644 index 734d909..0000000 --- a/containers/paperless/default.nix +++ /dev/null @@ -1,77 +0,0 @@ -# Nix-built Paperless-ngx for ringtail (amd64). -# -# Replaces the from-source Dockerfile build (s6-overlay) with nixpkgs' -# paperless-ngx, which already bundles the full OCR/imaging closure -# (tesseract, ghostscript, imagemagick, qpdf, poppler, jbig2enc) and the -# NLTK data via wrappers — so the image stays lean. -# -# Unlike the upstream s6 image, this image does NOT run all processes -# itself. Paperless is multi-process; on ringtail it runs as four -# containers sharing this one image, each with a different command: -# web -> paperless-web (granian, the wrapper below) -# worker -> celery --app paperless worker -# beat -> celery --app paperless beat -# consumer -> paperless-ngx document_consumer -# plus a redis/valkey sidecar. The PYTHONPATH/granian invocation mirrors -# the nixpkgs paperless NixOS module's paperless-web service exactly. -# -# Self-pins nixos-unstable: stable nixpkgs lags at 2.19.6, while unstable -# carries 2.20.15 — a same-minor forward patch bump from the previous -# Dockerfile build (v2.20.13). The version assertion makes nix-build fail -# if a pin bump changes the version, forcing an explicit acknowledgment -# here and in service-versions.yaml (enforced by container-version-check). -let - nixpkgs = fetchTarball { - url = "https://github.com/NixOS/nixpkgs/archive/331800de5053fcebacf6813adb5db9c9dca22a0c.tar.gz"; - sha256 = "1p54fm6dkbq62kpi55cr4wyx7b1nsajpsnjgs64cmp073fwi15f7"; - }; - pkgs = import nixpkgs { system = "x86_64-linux"; }; - - version = "2.20.15"; - - app = pkgs.paperless-ngx; - - # Mirror the NixOS module's paperless-web service: granian serving the - # ASGI app with the package's propagated deps + src on PYTHONPATH. - pythonPath = - "${app.python.pkgs.makePythonPath app.propagatedBuildInputs}:${app}/lib/paperless-ngx/src"; - - paperless-web = pkgs.writeShellScriptBin "paperless-web" '' - export PYTHONPATH="${pythonPath}" - export PAPERLESS_NLTK_DIR="${app.nltkDataDir}" - exec ${app.python.pkgs.granian}/bin/granian \ - --interface asginl --ws \ - --host 0.0.0.0 --port 8000 \ - "paperless.asgi:application" - ''; -in - -assert app.version == version; - -pkgs.dockerTools.buildLayeredImage { - name = "blumeops/paperless"; - - contents = [ - app - paperless-web - pkgs.bashInteractive - pkgs.coreutils - pkgs.cacert - pkgs.tzdata - ]; - - config = { - # Default command is the web server; worker/beat/consumer containers - # override `command` in their k8s manifests. - Cmd = [ "${paperless-web}/bin/paperless-web" ]; - Env = [ - "PAPERLESS_NLTK_DIR=${app.nltkDataDir}" - "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" - "PYTHONUNBUFFERED=1" - "PNGX_CONTAINERIZED=1" - ]; - ExposedPorts = { - "8000/tcp" = { }; - }; - }; -} diff --git a/containers/shower/default.nix b/containers/shower/default.nix index c5bd41e..d9863e1 100644 --- a/containers/shower/default.nix +++ b/containers/shower/default.nix @@ -1,15 +1,11 @@ # Nix-built shower app container — Adelaide / Heidi / Addie baby shower. # # The app is published as a wheel to the Forgejo PyPI index at -# https://forge.ops.eblu.me/api/packages/eblume/pypi/ (tailnet-only — the -# public forge.eblu.me /api/packages/* surface is blocked at the Fly edge). -# We can't point pip at Forgejo's simple index even from the tailnet, -# because Forgejo's index returns absolute file URLs hardcoded to its -# public ROOT_URL (forge.eblu.me), which then 403s. So both the wheel and -# the sdist are pulled by direct `fetchurl` against forge.ops.eblu.me, and -# the wheel is then handed to `pip install` as a local path; transitive -# deps come from pypi.ops.eblu.me. Build runs on the nix-container-builder -# runner (ringtail, amd64) so the image is native. +# https://forge.eblu.me/api/packages/eblume/pypi/. The wheel + its +# transitive Python deps are baked in at build time via a fixed-output +# derivation that runs `pip install --target` against forge PyPI (proxied +# through pypi.ops.eblu.me for upstream packages). Build runs on the +# nix-container-builder runner (ringtail, amd64) so the image is native. # # Going through pip-install-target rather than nixpkgs Python packages # sidesteps two issues we hit going through `python.pkgs.buildPythonPackage`: @@ -25,7 +21,7 @@ { pkgs ? import { } }: let - version = "1.1.3"; + version = "1.0.2"; python = pkgs.python314; @@ -43,17 +39,7 @@ let showerSdist = pkgs.fetchurl { name = "adelaide_baby_shower_app-${version}.tar.gz"; url = "https://forge.ops.eblu.me/api/packages/eblume/pypi/files/adelaide-baby-shower-app/${version}/adelaide_baby_shower_app-${version}.tar.gz"; - hash = "sha256-a3rCwEdOB+rnYXqsWDifyltpyKUgkOj0ikWB+WGQYKE="; - }; - - # Wheel pulled from forge.ops.eblu.me (tailnet) for the same reason the - # sdist is: Forgejo's PyPI simple index would return forge.eblu.me URLs - # that the Fly edge 403s on /api/packages/*. We hand this path to pip - # below so it never touches the forge index at all. - showerWheel = pkgs.fetchurl { - name = "adelaide_baby_shower_app-${version}-py3-none-any.whl"; - url = "https://forge.ops.eblu.me/api/packages/eblume/pypi/files/adelaide-baby-shower-app/${version}/adelaide_baby_shower_app-${version}-py3-none-any.whl"; - hash = "sha256-a6j91gBigG4IzE2DVTBntnZ46Yrx9b5PgHn+Uro98Tk="; + hash = "sha256-nlCtlx9zuYaLoJZSckybLV5YPpA8vZamN96O3RXOstM="; }; staticAssets = pkgs.runCommand "shower-static-assets-${version}" { } '' @@ -82,16 +68,11 @@ let ${python}/bin/python -m venv "$TMPDIR/venv" "$TMPDIR/venv/bin/pip" install --upgrade pip - - # Nix store paths embed a 32-char hash prefix, which pip's wheel - # filename parser rejects ("Invalid wheel filename"). Copy to a - # clean filename in TMPDIR before installing. - cp ${showerWheel} "$TMPDIR/${showerWheel.name}" - "$TMPDIR/venv/bin/pip" install \ --no-cache-dir \ --index-url=https://pypi.ops.eblu.me/root/pypi/+simple/ \ - "$TMPDIR/${showerWheel.name}" \ + --extra-index-url=https://forge.ops.eblu.me/api/packages/eblume/pypi/simple/ \ + "adelaide-baby-shower-app==${version}" \ gunicorn runHook postBuild @@ -148,7 +129,7 @@ let outputHashAlgo = "sha256"; # Pinned dep closure — reproducible until version bumps. To recompute, # set to pkgs.lib.fakeHash and read the failure. - outputHash = "sha256-1xx2qWAIwherklHIPXo6IOKkKHML1KUrUx6pbkMxffc="; + outputHash = "sha256-tSTH/HaDY7M0qxlauBTM+JekZAgF++K2lGP3PLvym/o="; dontFixup = true; }; diff --git a/containers/teslamate/container.py b/containers/teslamate/container.py new file mode 100644 index 0000000..519d77d --- /dev/null +++ b/containers/teslamate/container.py @@ -0,0 +1,104 @@ +"""TeslaMate — Tesla data logger. + +Two-stage build: Elixir+Node (builder), Debian slim (runtime). +Source cloned from forge mirror. +""" + +import dagger +from dagger import dag + +from blumeops.containers import clone_from_forge, oci_labels + +VERSION = "v3.0.0" + + +async def build(src: dagger.Directory) -> dagger.Container: + source = clone_from_forge("teslamate", VERSION) + + # Stage 1: Build Elixir release with Node.js assets + builder = ( + dag.container() + .from_("elixir:1.19.5-otp-26") + .with_exec( + [ + "bash", + "-c", + "apt-get update" + " && apt-get install -y ca-certificates curl gnupg git zstd brotli" + " && mkdir -p /etc/apt/keyrings" + " && curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key" + " | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg" + ' && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg]' + ' https://deb.nodesource.com/node_22.x nodistro main"' + " > /etc/apt/sources.list.d/nodesource.list" + " && apt-get update" + " && apt-get install -y nodejs" + " && apt-get clean" + " && rm -rf /var/lib/apt/lists/*", + ] + ) + .with_exec(["mix", "local.rebar", "--force"]) + .with_exec(["mix", "local.hex", "--force"]) + .with_directory("/opt/app", source) + .with_workdir("/opt/app") + .with_env_variable("MIX_ENV", "prod") + .with_exec(["mix", "deps.get", "--only", "prod"]) + .with_exec(["mix", "deps.compile"]) + .with_exec( + [ + "npm", + "ci", + "--prefix", + "./assets", + "--progress=false", + "--no-audit", + "--loglevel=error", + ] + ) + .with_exec(["mix", "assets.deploy"]) + .with_exec(["mix", "compile"]) + .with_exec( + ["bash", "-c", "SKIP_LOCALE_DOWNLOAD=true mix release --path /opt/built"] + ) + ) + + # Stage 2: Debian slim runtime + entrypoint = src.file("containers/teslamate/entrypoint.sh") + + runtime = ( + dag.container() + .from_("debian:trixie-slim") + .with_exec( + [ + "bash", + "-c", + "apt-get update && apt-get install -y --no-install-recommends" + " libodbc2 libsctp1 libssl3t64 libstdc++6" + " netcat-openbsd tini tzdata" + " && apt-get clean" + " && rm -rf /var/lib/apt/lists/*" + " && groupadd --gid 10001 --system nonroot" + " && useradd --uid 10000 --system --gid nonroot" + " --home-dir /home/nonroot --shell /sbin/nologin nonroot", + ] + ) + ) + runtime = oci_labels( + runtime, + title="TeslaMate", + description="Tesla data logger and visualization", + version=VERSION, + ) + return ( + runtime.with_env_variable("LANG", "C.UTF-8") + .with_env_variable("SRTM_CACHE", "/opt/app/.srtm_cache") + .with_env_variable("HOME", "/opt/app") + .with_workdir("/opt/app") + .with_directory("/opt/app", builder.directory("/opt/built"), owner="nonroot") + .with_exec(["mkdir", "-p", "/opt/app/.srtm_cache"]) + .with_file("/entrypoint.sh", entrypoint, permissions=0o555, owner="nonroot") + .with_user("nonroot") + .with_exposed_port(4000) + .with_entrypoint(["tini", "--", "/bin/dash", "/entrypoint.sh"]) + .with_default_args(args=["bin/teslamate", "start"]) + ) diff --git a/containers/teslamate/default.nix b/containers/teslamate/default.nix deleted file mode 100644 index e126561..0000000 --- a/containers/teslamate/default.nix +++ /dev/null @@ -1,122 +0,0 @@ -# Nix-built TeslaMate for ringtail (amd64). -# -# Replaces the Dagger container.py (Elixir+Node builder -> Debian slim). -# TeslaMate is NOT in nixpkgs, so this is a from-scratch beamPackages -# mixRelease: an Elixir/Phoenix release with npm-built assets. -# -# Pinned to the same nixos-unstable rev as paperless/mealie for a -# consistent toolchain. The BEAM combo is pinned to erlang_27 + elixir_1_18 -# (teslamate requires elixir ~> 1.17; upstream's image uses OTP 26, so we -# stay off the default OTP 28 which elixir 1.18 does not target). -# -# Source comes from the forge mirror (supply-chain control), pinned by the -# v3.0.0 tag's commit so builtins.fetchGit needs no hash. -let - nixpkgs = fetchTarball { - url = "https://github.com/NixOS/nixpkgs/archive/331800de5053fcebacf6813adb5db9c9dca22a0c.tar.gz"; - sha256 = "1p54fm6dkbq62kpi55cr4wyx7b1nsajpsnjgs64cmp073fwi15f7"; - }; - pkgs = import nixpkgs { system = "x86_64-linux"; }; - lib = pkgs.lib; - - version = "3.0.0"; - - beamPackages = pkgs.beam.packages.erlang_27; - elixir = beamPackages.elixir_1_18; - - src = builtins.fetchGit { - url = "https://forge.ops.eblu.me/mirrors/teslamate.git"; - ref = "refs/tags/v${version}"; - rev = "3281154d42330786a182c1bbe094ecda0b1c5578"; - }; - - # ex_cldr downloads locale JSON from GitHub at compile time, which the - # build sandbox blocks. teslamate's cldr.ex reads the data dir from the - # LOCALES env var; point it at the pre-fetched elixir-cldr data so no - # download is attempted (with SKIP_LOCALE_DOWNLOAD=true disabling the - # forced refresh). CLDR data version matches the compile-time errors. - cldrData = pkgs.fetchFromGitHub { - owner = "elixir-cldr"; - repo = "cldr"; - rev = "v2.46.0"; - sha256 = "1iwzk9dc754l72vpf8vsisdjncnjx26pz509552b6vnm49xbxyji"; - }; - - teslamate = beamPackages.mixRelease { - pname = "teslamate"; - inherit version src elixir; - - # Keep the build-generated Erlang cookie in the release. mixRelease - # strips it by default (expecting RELEASE_COOKIE at runtime), but the - # start script reads releases/COOKIE. teslamate is single-node (no - # distributed Erlang exposed), so a baked-in cookie is fine. - removeCookie = false; - - mixFodDeps = beamPackages.fetchMixDeps { - pname = "mix-deps-teslamate"; - inherit src version elixir; - hash = "sha256-DDrREiM1BIMgD2qFPTK8QyjOYlnfE3XlnaH/jk7G2go="; - }; - - # Frontend assets. esbuild + sass are devDeps and the esbuild platform - # binary is an optional dep, so npm ci must include both. We run npm ci - # here (not a separate derivation) because assets/package.json has - # file:../deps/phoenix references that only resolve once mixFodDeps has - # populated deps/. npmConfigHook wires up the offline cache from npmDeps; - # then `node scripts/build.js` (custom esbuild) + `mix phx.digest`. - nativeBuildInputs = [ pkgs.nodejs pkgs.npmHooks.npmConfigHook ]; - npmDeps = pkgs.fetchNpmDeps { - name = "teslamate-npm-deps"; - src = src + "/assets"; - hash = "sha256-XyiaUkT/c4rZnNxmxhVLb+vEXnc64A1hjOrnR5fhaEk="; - }; - npmRoot = "assets"; - - preBuild = '' - export SKIP_LOCALE_DOWNLOAD=true - export LOCALES=${cldrData}/priv/cldr - ( cd assets && npm ci --include=dev --include=optional && node scripts/build.js ) - mix phx.digest --no-deps-check - ''; - }; -in - -pkgs.dockerTools.buildLayeredImage { - name = "blumeops/teslamate"; - - contents = [ - teslamate - pkgs.bashInteractive - pkgs.coreutils - pkgs.dash - pkgs.netcat-openbsd - pkgs.cacert - pkgs.tzdata - ]; - - config = { - # Mirror entrypoint.sh: wait for postgres, run migrations, then start. - Entrypoint = [ - "${pkgs.dash}/bin/dash" - "-c" - '' - : "''${DATABASE_HOST:=127.0.0.1}" - : "''${DATABASE_PORT:=5432}" - while ! ${pkgs.netcat-openbsd}/bin/nc -z "$DATABASE_HOST" "$DATABASE_PORT" 2>/dev/null; do - echo "waiting for postgres at $DATABASE_HOST:$DATABASE_PORT"; sleep 1 - done - ${teslamate}/bin/teslamate eval "TeslaMate.Release.migrate" - exec ${teslamate}/bin/teslamate start - '' - ]; - Env = [ - "HOME=/opt/app" - "SRTM_CACHE=/opt/app/.srtm_cache" - "LANG=C.UTF-8" - "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" - ]; - ExposedPorts = { - "4000/tcp" = { }; - }; - }; -} diff --git a/containers/teslamate/entrypoint.sh b/containers/teslamate/entrypoint.sh new file mode 100644 index 0000000..f66117e --- /dev/null +++ b/containers/teslamate/entrypoint.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env dash +set -e + +: "${DATABASE_HOST:="127.0.0.1"}" +: "${DATABASE_PORT:=5432}" +: "${ULIMIT_MAX_NOFILE:=65536}" + +# prevent memory bloat in some misconfigured versions of Docker/containerd +# where the nofiles limit is very large. 0 means don't set it. +if test "${ULIMIT_MAX_NOFILE}" != 0 && test "$(ulimit -n)" -gt "${ULIMIT_MAX_NOFILE}"; then + ulimit -n "${ULIMIT_MAX_NOFILE}" +fi + +# wait until Postgres is ready +while ! nc -z "${DATABASE_HOST}" "${DATABASE_PORT}" 2>/dev/null; do + echo waiting for postgres at "${DATABASE_HOST}":"${DATABASE_PORT}" + sleep 1s +done + +# apply migrations +bin/teslamate eval "TeslaMate.Release.migrate" + +exec "$@" diff --git a/containers/unpoller/Dockerfile b/containers/unpoller/Dockerfile new file mode 100644 index 0000000..241b375 --- /dev/null +++ b/containers/unpoller/Dockerfile @@ -0,0 +1,43 @@ +# UnPoller — UniFi metrics exporter for Prometheus +# Two-stage build: Go compilation, then minimal Alpine runtime + +ARG CONTAINER_APP_VERSION=v2.34.0 + +FROM golang:alpine3.22 AS build + +ARG CONTAINER_APP_VERSION +RUN apk add --no-cache git + +RUN git clone --depth 1 --branch ${CONTAINER_APP_VERSION} \ + https://forge.ops.eblu.me/mirrors/unpoller.git /app + +WORKDIR /app + +ENV CGO_ENABLED=0 + +RUN go build -ldflags="-s -w \ + -X main.version=${CONTAINER_APP_VERSION} \ + -X main.builtBy=blumeops \ + -X golift.io/version.Version=${CONTAINER_APP_VERSION} \ + -X golift.io/version.Branch=HEAD \ + -X golift.io/version.BuildUser=blumeops \ + -X golift.io/version.Revision=blumeops-build" \ + -o /bin/unpoller . + +FROM alpine:3.22 + +ARG CONTAINER_APP_VERSION +LABEL org.opencontainers.image.title="UnPoller" +LABEL org.opencontainers.image.description="UniFi metrics exporter for Prometheus" +LABEL org.opencontainers.image.version="${CONTAINER_APP_VERSION}" +LABEL org.opencontainers.image.source="https://forge.eblu.me/eblume/blumeops" +LABEL org.opencontainers.image.vendor="blumeops" + +RUN apk add --no-cache ca-certificates tzdata + +COPY --from=build /bin/unpoller /usr/bin/unpoller + +EXPOSE 9130 +USER 65534:65534 +ENTRYPOINT ["/usr/bin/unpoller"] +CMD ["--config", "/etc/unpoller/up.conf"] diff --git a/containers/unpoller/container.py b/containers/unpoller/container.py deleted file mode 100644 index bfc75ba..0000000 --- a/containers/unpoller/container.py +++ /dev/null @@ -1,53 +0,0 @@ -"""UnPoller — UniFi metrics exporter for Prometheus. - -Two-stage build: Go backend, Alpine runtime. -Source cloned from forge mirror. -""" - -import dagger - -from blumeops.containers import ( - alpine_runtime, - clone_from_forge, - go_build, - oci_labels, -) - -VERSION = "v3.2.0" - - -async def build(src: dagger.Directory) -> dagger.Container: - source = clone_from_forge("unpoller", VERSION) - - backend = go_build( - source, - "/unpoller", - ldflags=( - f"-s -w " - f"-X main.version={VERSION} " - f"-X main.builtBy=blumeops " - f"-X golift.io/version.Version={VERSION} " - f"-X golift.io/version.Branch=HEAD " - f"-X golift.io/version.BuildUser=blumeops " - f"-X golift.io/version.Revision=blumeops-build" - ), - ) - - runtime = alpine_runtime( - extra_apk=["ca-certificates", "tzdata"], - create_user=False, - ) - runtime = oci_labels( - runtime, - title="UnPoller", - description="UniFi metrics exporter for Prometheus", - version=VERSION, - ) - return ( - runtime.with_file("/usr/bin/unpoller", backend.file("/unpoller")) - .with_exposed_port(9130) - .with_user("65534") - .with_default_args( - args=["/usr/bin/unpoller", "--config", "/etc/unpoller/up.conf"] - ) - ) diff --git a/containers/valkey/container.py b/containers/valkey/container.py index 34e8524..5d150e7 100644 --- a/containers/valkey/container.py +++ b/containers/valkey/container.py @@ -1,8 +1,8 @@ -"""Valkey — native Dagger build (arm64, indri). +"""Valkey — native Dagger build. Alpine 3.22 base with the `valkey` apk package (8.1.x — Redis-compatible). -Used by paperless (sidecar) on indri. immich on ringtail uses the -nix-built amd64 variant from `default.nix` in this directory. +Mirrors `docker.io/valkey/valkey:8.1-alpine`, used by paperless and immich +as a cache/queue sidecar. """ import dagger @@ -10,10 +10,9 @@ from dagger import dag from blumeops.containers import oci_labels -# Alpine 3.22 currently ships valkey 8.1.7-r0. Alpine 3.23 jumps to 9.0 — -# hold on 3.22 to keep this aligned with the 8.1 line. -VERSION = "8.1.7" -ALPINE_PIN = "8.1.7-r0" +# Alpine 3.22 ships valkey 8.1.6-r0. Alpine 3.23 jumps to 9.0 — hold on 3.22 +# to keep this a 1:1 swap for the upstream `valkey:8.1-alpine` image. +VERSION = "8.1.6-r0" ALPINE_BASE = "alpine:3.22" @@ -22,7 +21,7 @@ async def build(src: dagger.Directory) -> dagger.Container: ctr = ( dag.container() .from_(ALPINE_BASE) - .with_exec(["apk", "add", "--no-cache", f"valkey={ALPINE_PIN}"]) + .with_exec(["apk", "add", "--no-cache", f"valkey={VERSION}"]) .with_exec(["mkdir", "-p", "/data"]) .with_exec(["chown", "valkey:valkey", "/data"]) .with_workdir("/data") diff --git a/containers/valkey/default.nix b/containers/valkey/default.nix deleted file mode 100644 index 9cb1713..0000000 --- a/containers/valkey/default.nix +++ /dev/null @@ -1,30 +0,0 @@ -# Nix-built Valkey for ringtail (amd64) -# Companion to container.py (Alpine 3.22, arm64 on indri). -# Used by immich-ringtail which needs an amd64 image; paperless on indri -# continues to use the Alpine container.py build. -# -# The version assertion ensures nix-build fails if a flake.lock update -# changes the Valkey version — forcing an explicit version acknowledgment -# here and in service-versions.yaml (enforced by container-version-check). -{ pkgs ? import { } }: - -let - version = "8.1.7"; -in - -assert pkgs.valkey.version == version; - -pkgs.dockerTools.buildLayeredImage { - name = "blumeops/valkey"; - contents = [ - pkgs.valkey - ]; - - config = { - Entrypoint = [ "${pkgs.valkey}/bin/valkey-server" ]; - Cmd = [ "--bind" "0.0.0.0" "--protected-mode" "no" "--dir" "/data" ]; - ExposedPorts = { - "6379/tcp" = { }; - }; - }; -} diff --git a/docs/changelog.d/+agent-file-neutralization.ai.md b/docs/changelog.d/+agent-file-neutralization.ai.md new file mode 100644 index 0000000..da16fba --- /dev/null +++ b/docs/changelog.d/+agent-file-neutralization.ai.md @@ -0,0 +1 @@ +Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path. diff --git a/docs/changelog.d/+alloy-main-sha-rebuild.infra.md b/docs/changelog.d/+alloy-main-sha-rebuild.infra.md new file mode 100644 index 0000000..42a7b37 --- /dev/null +++ b/docs/changelog.d/+alloy-main-sha-rebuild.infra.md @@ -0,0 +1,5 @@ +Rebuild and retag alloy v1.16.0 container images from the main-branch SHA +following the squash-merge of #345, per the build-container-image +squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`) +now reference `9564435` rather than the branch SHA `26a3ab5`, restoring +source traceability after branch cleanup. diff --git a/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md b/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md new file mode 100644 index 0000000..471990f --- /dev/null +++ b/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md @@ -0,0 +1,6 @@ +Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go +1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale +MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned, +and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started +in #345 — all four Alloy services (alloy-k8s, alloy-ringtail, +alloy-tracing-ringtail, alloy ansible) now run v1.16.0. diff --git a/docs/changelog.d/+argocd-resource-limits.infra.md b/docs/changelog.d/+argocd-resource-limits.infra.md new file mode 100644 index 0000000..ba24a5a --- /dev/null +++ b/docs/changelog.d/+argocd-resource-limits.infra.md @@ -0,0 +1 @@ +Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events. diff --git a/docs/changelog.d/+blumeops-tasks-due-recurrence.feature.md b/docs/changelog.d/+blumeops-tasks-due-recurrence.feature.md new file mode 100644 index 0000000..83072dd --- /dev/null +++ b/docs/changelog.d/+blumeops-tasks-due-recurrence.feature.md @@ -0,0 +1 @@ +`blumeops-tasks` now annotates each task with a human-readable due offset (`5d overdue` / `due in 2d` / `due today`) and a `↻ ` marker for recurring tasks, and sorts by overdue-ness (most overdue first, no-due-date last) with priority as tiebreaker. diff --git a/docs/changelog.d/+claude-md-import-agents.ai.md b/docs/changelog.d/+claude-md-import-agents.ai.md new file mode 100644 index 0000000..f63231e --- /dev/null +++ b/docs/changelog.d/+claude-md-import-agents.ai.md @@ -0,0 +1 @@ +CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally. diff --git a/docs/changelog.d/+compliance-mute-categories.doc.md b/docs/changelog.d/+compliance-mute-categories.doc.md new file mode 100644 index 0000000..c776e46 --- /dev/null +++ b/docs/changelog.d/+compliance-mute-categories.doc.md @@ -0,0 +1 @@ +New explanation article [[compliance-mute-categories]] documenting the gap between current `CC:`-only mute tagging and the three structurally distinct categories (compensating control, not-applicable, risk-accepted) needed for real PCI DSS / SOC2 practice. Captures the current image-scan mutelist gap (`cronjob-image-scan.yaml` doesn't pass `--mutelist-file`) and proposes an order-of-operations for wiring it up alongside the new tag conventions. Triggered by CVE-2026-31789, an OpenSSL 32-bit-only finding that surfaced the need for an NA category. diff --git a/docs/changelog.d/+container-build-suggest-runner-logs.misc.md b/docs/changelog.d/+container-build-suggest-runner-logs.misc.md new file mode 100644 index 0000000..d10ea51 --- /dev/null +++ b/docs/changelog.d/+container-build-suggest-runner-logs.misc.md @@ -0,0 +1 @@ +`container-build-and-release` now prints the specific `mise run runner-logs ` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered. diff --git a/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md b/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md deleted file mode 100644 index 2e931d4..0000000 --- a/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md +++ /dev/null @@ -1 +0,0 @@ -Rebuilt the locally-built external-secrets image from the `main` branch so the deployed tag (`v2.2.0-0e70a1b`) traces to a `main` commit rather than the now-merged feature branch, giving a stable provenance reference. diff --git a/docs/changelog.d/+external-secrets-stable-main-sha.infra.md b/docs/changelog.d/+external-secrets-stable-main-sha.infra.md deleted file mode 100644 index fbe3c21..0000000 --- a/docs/changelog.d/+external-secrets-stable-main-sha.infra.md +++ /dev/null @@ -1 +0,0 @@ -Rebuilt the external-secrets images off `main` and repointed both clusters to the stable main-sha tags (`v2.2.0-13895bb` arm64 / `v2.2.0-13895bb-nix` amd64), so the deployed images on indri and ringtail trace to the same `main` commit rather than earlier feature-branch builds. diff --git a/docs/changelog.d/+fix-forge-static-assets.bugfix.md b/docs/changelog.d/+fix-forge-static-assets.bugfix.md new file mode 100644 index 0000000..de0517e --- /dev/null +++ b/docs/changelog.d/+fix-forge-static-assets.bugfix.md @@ -0,0 +1 @@ +Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests. diff --git a/docs/changelog.d/+frigate-notify-local.infra.md b/docs/changelog.d/+frigate-notify-local.infra.md new file mode 100644 index 0000000..120f915 --- /dev/null +++ b/docs/changelog.d/+frigate-notify-local.infra.md @@ -0,0 +1 @@ +Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`. diff --git a/docs/changelog.d/+heph-hub-v1.2.1.infra.md b/docs/changelog.d/+heph-hub-v1.2.1.infra.md deleted file mode 100644 index c203323..0000000 --- a/docs/changelog.d/+heph-hub-v1.2.1.infra.md +++ /dev/null @@ -1 +0,0 @@ -Bumped the indri heph hub to v1.2.1, which adds the hub `GET /config` endpoint and ships the heph-pwa **Login with Authentik** flow (Authorization Code + PKCE). Pairs with the Authentik `heph` provider redirect URIs registered earlier. diff --git a/docs/changelog.d/+homepage-config-perms-fix.bugfix.md b/docs/changelog.d/+homepage-config-perms-fix.bugfix.md new file mode 100644 index 0000000..20e1135 --- /dev/null +++ b/docs/changelog.d/+homepage-config-perms-fix.bugfix.md @@ -0,0 +1,5 @@ +Fixed homepage container EACCES on cold start: the nix-built image now chowns +`/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the +behavior of the old Dockerfile. Without this, homepage couldn't seed missing +skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on +its first uncached request. Caught during the ringtail cutover. diff --git a/docs/changelog.d/+prowler-rebuild-on-main.infra.md b/docs/changelog.d/+prowler-rebuild-on-main.infra.md new file mode 100644 index 0000000..107b687 --- /dev/null +++ b/docs/changelog.d/+prowler-rebuild-on-main.infra.md @@ -0,0 +1 @@ +Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes. diff --git a/docs/changelog.d/+remove-devpi-container-build.misc.md b/docs/changelog.d/+remove-devpi-container-build.misc.md new file mode 100644 index 0000000..8ebec54 --- /dev/null +++ b/docs/changelog.d/+remove-devpi-container-build.misc.md @@ -0,0 +1 @@ +Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name. diff --git a/docs/changelog.d/+review-cc-ephemeral-privileged-jobs.misc.md b/docs/changelog.d/+review-cc-ephemeral-privileged-jobs.misc.md new file mode 100644 index 0000000..14dcdca --- /dev/null +++ b/docs/changelog.d/+review-cc-ephemeral-privileged-jobs.misc.md @@ -0,0 +1 @@ +Reviewed compensating control `ephemeral-privileged-jobs`: TTL and hostPID scope verified on indri. Noted that the alloy-tracing DaemonSet on ringtail is out of scope until Prowler scans ringtail (tracked in Todoist). diff --git a/docs/changelog.d/+review-cc-init-container-isolation.misc.md b/docs/changelog.d/+review-cc-init-container-isolation.misc.md new file mode 100644 index 0000000..295e7f8 --- /dev/null +++ b/docs/changelog.d/+review-cc-init-container-isolation.misc.md @@ -0,0 +1 @@ +Reviewed compensating control `init-container-isolation` (35 days stale). Grafana's running pod matches the manifest and the CC's claim — only `init-chown-data` runs as root with `CHOWN`; runtime containers all run as UID 472 with all caps dropped. Retirement (replacing init-chown-data with `fsGroup` alone) is plausible given the in-tree minikube-hostpath provisioner, but deferred until grafana lands on ringtail's k3s — note added to the CC. diff --git a/docs/changelog.d/+review-cc-trusted-ci-only.misc.md b/docs/changelog.d/+review-cc-trusted-ci-only.misc.md new file mode 100644 index 0000000..89dc653 --- /dev/null +++ b/docs/changelog.d/+review-cc-trusted-ci-only.misc.md @@ -0,0 +1 @@ +Reviewed compensating control `trusted-ci-only`: Forgejo runner is registered only to the private forge, which has registration disabled — no untrusted users can create repos or trigger privileged CI. Tightened the notes to reflect that the closed-forge property (not a per-repo allow-list) is what actually mitigates the risk. diff --git a/docs/changelog.d/+review-compliance-image-iac.feature.md b/docs/changelog.d/+review-compliance-image-iac.feature.md new file mode 100644 index 0000000..1125359 --- /dev/null +++ b/docs/changelog.d/+review-compliance-image-iac.feature.md @@ -0,0 +1 @@ +`review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings. diff --git a/docs/changelog.d/+review-contributing-doc.doc.md b/docs/changelog.d/+review-contributing-doc.doc.md new file mode 100644 index 0000000..c394a01 --- /dev/null +++ b/docs/changelog.d/+review-contributing-doc.doc.md @@ -0,0 +1 @@ +Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`. diff --git a/docs/changelog.d/+review-index-doc.doc.md b/docs/changelog.d/+review-index-doc.doc.md new file mode 100644 index 0000000..7016a7a --- /dev/null +++ b/docs/changelog.d/+review-index-doc.doc.md @@ -0,0 +1 @@ +Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`. diff --git a/docs/changelog.d/+review-navidrome-doc.doc.md b/docs/changelog.d/+review-navidrome-doc.doc.md new file mode 100644 index 0000000..fbe5e79 --- /dev/null +++ b/docs/changelog.d/+review-navidrome-doc.doc.md @@ -0,0 +1 @@ +Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests. diff --git a/docs/changelog.d/+review-ollama-doc.doc.md b/docs/changelog.d/+review-ollama-doc.doc.md new file mode 100644 index 0000000..05ef23e --- /dev/null +++ b/docs/changelog.d/+review-ollama-doc.doc.md @@ -0,0 +1 @@ +Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`. diff --git a/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md b/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md new file mode 100644 index 0000000..6801040 --- /dev/null +++ b/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md @@ -0,0 +1,3 @@ +Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings. + +Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section. diff --git a/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md b/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md new file mode 100644 index 0000000..24ffcb9 --- /dev/null +++ b/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md @@ -0,0 +1 @@ +rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround. diff --git a/docs/changelog.d/+runner-logs-auth.feature.md b/docs/changelog.d/+runner-logs-auth.feature.md new file mode 100644 index 0000000..9ee6fa1 --- /dev/null +++ b/docs/changelog.d/+runner-logs-auth.feature.md @@ -0,0 +1 @@ +runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos. diff --git a/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md b/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md new file mode 100644 index 0000000..24bb81c --- /dev/null +++ b/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md @@ -0,0 +1 @@ +Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup. diff --git a/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md b/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md deleted file mode 100644 index cc29cf7..0000000 --- a/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Fixed the `tailscale-operator` and `tailscale-operator-ringtail` ArgoCD apps showing `Unknown` sync status. Their shared base kustomization fetched the upstream operator manifest from the public `forge.eblu.me/mirrors/...`, which the AI-scraper mitigation now black-holes (403). Pointed the remote resource at the tailnet host `forge.ops.eblu.me` instead, which the in-cluster repo-server can reach. diff --git a/docs/changelog.d/+transmission-doc-review.doc.md b/docs/changelog.d/+transmission-doc-review.doc.md new file mode 100644 index 0000000..418504f --- /dev/null +++ b/docs/changelog.d/+transmission-doc-review.doc.md @@ -0,0 +1 @@ +Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar. diff --git a/docs/changelog.d/+valkey-main-tag-bump.infra.md b/docs/changelog.d/+valkey-main-tag-bump.infra.md new file mode 100644 index 0000000..cd19f60 --- /dev/null +++ b/docs/changelog.d/+valkey-main-tag-bump.infra.md @@ -0,0 +1 @@ +Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main. diff --git a/docs/changelog.d/+zot-v2.1.16.infra.md b/docs/changelog.d/+zot-v2.1.16.infra.md new file mode 100644 index 0000000..f007164 --- /dev/null +++ b/docs/changelog.d/+zot-v2.1.16.infra.md @@ -0,0 +1 @@ +Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits). diff --git a/docs/changelog.d/alloy-v1.16.0.infra.md b/docs/changelog.d/alloy-v1.16.0.infra.md new file mode 100644 index 0000000..cd9a1ef --- /dev/null +++ b/docs/changelog.d/alloy-v1.16.0.infra.md @@ -0,0 +1,5 @@ +Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments +(alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on +indri). Pulls in stable database observability (v1.15) and the OTel Collector +v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger +`container.py` per the build-container-image migration playbook. diff --git a/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md b/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md new file mode 100644 index 0000000..79a81cf --- /dev/null +++ b/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md @@ -0,0 +1 @@ +Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz`→`docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone. diff --git a/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md b/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md new file mode 100644 index 0000000..35f77c2 --- /dev/null +++ b/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md @@ -0,0 +1 @@ +Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper. diff --git a/docs/changelog.d/external-secrets-ringtail-nix.infra.md b/docs/changelog.d/external-secrets-ringtail-nix.infra.md deleted file mode 100644 index 9ce3f85..0000000 --- a/docs/changelog.d/external-secrets-ringtail-nix.infra.md +++ /dev/null @@ -1 +0,0 @@ -Completed the external-secrets localization for the ringtail (amd64) cluster. The indri Dagger build (`container.py`) only produces an arm64 image; added `containers/external-secrets/default.nix` to build the amd64 variant on ringtail's nix-container-builder, and gave `external-secrets-ringtail` a thin kustomize overlay that reuses the shared manifest and points at the `-nix` image. Both clusters now run the locally-built external-secrets binary on their native architecture. diff --git a/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md b/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md new file mode 100644 index 0000000..cc35684 --- /dev/null +++ b/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md @@ -0,0 +1 @@ +Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation. diff --git a/docs/changelog.d/heph-indri-hub.infra.md b/docs/changelog.d/heph-indri-hub.infra.md deleted file mode 100644 index 6761cb7..0000000 --- a/docs/changelog.d/heph-indri-hub.infra.md +++ /dev/null @@ -1 +0,0 @@ -Added the [[hephaestus]] (`heph`) sync hub to indri as a self-updating LaunchAgent managed by Ansible (`ansible/roles/heph`, tag `heph`). The hub runs `hephd --mode server` behind `heph.ops.eblu.me` (Caddy TLS), with self-update on a 10-minute interval and the heph-pwa mobile shell served from `--web-root`. Access is gated by a new Authentik device-code (RFC 8628) OIDC application. Indri is now the canonical hub; other devices (e.g. gilbert) attach as offline-capable spokes. The hub's store was seeded from gilbert via the data-safe Path A bring-up (copy store, reset `meta.origin`). diff --git a/docs/changelog.d/heph-offline-access.bugfix.md b/docs/changelog.d/heph-offline-access.bugfix.md deleted file mode 100644 index e9721bc..0000000 --- a/docs/changelog.d/heph-offline-access.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Granted the `offline_access` scope on the Authentik `heph` OAuth2 provider so hephaestus spokes receive a durable 30-day refresh token. Previously the refresh token was session-bound, so spoke sync would silently fail with a `400 Bad Request` on the `refresh_token` grant once the Authentik session lapsed. diff --git a/docs/changelog.d/heph-pwa-redirect-uris.infra.md b/docs/changelog.d/heph-pwa-redirect-uris.infra.md deleted file mode 100644 index f887eed..0000000 --- a/docs/changelog.d/heph-pwa-redirect-uris.infra.md +++ /dev/null @@ -1 +0,0 @@ -Registered the heph-pwa redirect URIs (`https://heph.ops.eblu.me/`, plus `http://localhost:8787/` for dev) on the Authentik `heph` OAuth2 provider, enabling the PWA's new Authorization Code + PKCE "Login with Authentik" flow (and the token-endpoint CORS it needs). Pairs with hephaestus PR #9. diff --git a/docs/changelog.d/homepage-to-ringtail.infra.md b/docs/changelog.d/homepage-to-ringtail.infra.md new file mode 100644 index 0000000..1e3e795 --- /dev/null +++ b/docs/changelog.d/homepage-to-ringtail.infra.md @@ -0,0 +1,8 @@ +Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64). +The container is now built via nix (`containers/homepage/default.nix`), adapted +from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and +wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on +minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, +Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries +in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama) +auto-populate via Ingress annotations. diff --git a/docs/changelog.d/local-external-secrets.infra.md b/docs/changelog.d/local-external-secrets.infra.md deleted file mode 100644 index 13cbb05..0000000 --- a/docs/changelog.d/local-external-secrets.infra.md +++ /dev/null @@ -1 +0,0 @@ -Localized the external-secrets controller image. It now builds from the forge mirror via a native Dagger `container.py` (single `all_providers` static Go binary, faithful to upstream's `make build`) and is served from `registry.ops.eblu.me/blumeops/external-secrets` instead of `ghcr.io`, bringing another platform component under local supply-chain control. diff --git a/docs/changelog.d/migrate-cv-docs-to-indri.infra.md b/docs/changelog.d/migrate-cv-docs-to-indri.infra.md new file mode 100644 index 0000000..608a6b9 --- /dev/null +++ b/docs/changelog.d/migrate-cv-docs-to-indri.infra.md @@ -0,0 +1 @@ +Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down. diff --git a/docs/changelog.d/migrate-devpi-to-indri.infra.md b/docs/changelog.d/migrate-devpi-to-indri.infra.md new file mode 100644 index 0000000..418db70 --- /dev/null +++ b/docs/changelog.d/migrate-devpi-to-indri.infra.md @@ -0,0 +1 @@ +Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]]. diff --git a/docs/changelog.d/mirror-tailscale-container.infra.md b/docs/changelog.d/mirror-tailscale-container.infra.md new file mode 100644 index 0000000..54ca3ba --- /dev/null +++ b/docs/changelog.d/mirror-tailscale-container.infra.md @@ -0,0 +1 @@ +Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration. diff --git a/docs/changelog.d/prowler-iac-mutelist.infra.md b/docs/changelog.d/prowler-iac-mutelist.infra.md new file mode 100644 index 0000000..793c1ec --- /dev/null +++ b/docs/changelog.d/prowler-iac-mutelist.infra.md @@ -0,0 +1 @@ +Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var. Two new compensating controls — `operator-purpose-bound-rbac` and `kube-state-metrics-metadata-only` — justify muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`. diff --git a/docs/changelog.d/reviews-jun4.doc.md b/docs/changelog.d/reviews-jun4.doc.md deleted file mode 100644 index f1aeaa8..0000000 --- a/docs/changelog.d/reviews-jun4.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration. diff --git a/docs/changelog.d/reviews-jun4.infra.md b/docs/changelog.d/reviews-jun4.infra.md deleted file mode 100644 index c128e70..0000000 --- a/docs/changelog.d/reviews-jun4.infra.md +++ /dev/null @@ -1 +0,0 @@ -Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup). diff --git a/docs/changelog.d/shower-app-deploy.bugfix.md b/docs/changelog.d/shower-app-deploy.bugfix.md new file mode 100644 index 0000000..91d2b3b --- /dev/null +++ b/docs/changelog.d/shower-app-deploy.bugfix.md @@ -0,0 +1,13 @@ +Shower app container now bakes the wheel + Python deps into the image +at build time via `buildPythonPackage` instead of pip-installing on +first boot. Boots are deterministic and don't depend on forge PyPI +being reachable from the pod. The `wheelHash` in +`containers/shower/default.nix` is the sha256 sourced from the +[forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/); +bumping the version means bumping that hash too. + +Borgmatic now covers the shower app: SQLite is dumped from the live +pod via `kubectl exec` (mirroring the existing mealie entry, with +`context: k3s-ringtail`), and the prize-photo media share is picked up +through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as +`/Volumes/photos`). diff --git a/docs/changelog.d/shower-app-deploy.feature.md b/docs/changelog.d/shower-app-deploy.feature.md new file mode 100644 index 0000000..96218be --- /dev/null +++ b/docs/changelog.d/shower-app-deploy.feature.md @@ -0,0 +1,4 @@ +Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle +picker, and prize assignment console — on ringtail k3s with `shower.eblu.me` +as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App +source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). diff --git a/docs/changelog.d/shower-app-deploy.infra.md b/docs/changelog.d/shower-app-deploy.infra.md new file mode 100644 index 0000000..157a068 --- /dev/null +++ b/docs/changelog.d/shower-app-deploy.infra.md @@ -0,0 +1,9 @@ +Wire shower app for public exposure: fly nginx `shower.eblu.me` server +block as a guest-only surface — splash page, `/prizes//`, static +assets, media. Everything authenticated (`/admin/`, `/host/`, +`/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit +`shower.ops.eblu.me` for the operator console + admin; the app's +v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on +the tailnet point back at the WAN host for guests. Plus a Caddy route +on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking +request rate, error rate, latency, bandwidth, and access logs. diff --git a/docs/changelog.d/update-tooling-deps-2026-04.doc.md b/docs/changelog.d/update-tooling-deps-2026-04.doc.md new file mode 100644 index 0000000..141e975 --- /dev/null +++ b/docs/changelog.d/update-tooling-deps-2026-04.doc.md @@ -0,0 +1 @@ +New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync. diff --git a/docs/changelog.d/update-tooling-deps-2026-04.infra.md b/docs/changelog.d/update-tooling-deps-2026-04.infra.md new file mode 100644 index 0000000..4731eca --- /dev/null +++ b/docs/changelog.d/update-tooling-deps-2026-04.infra.md @@ -0,0 +1 @@ +Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks. diff --git a/docs/changelog.d/valkey-mirror.infra.md b/docs/changelog.d/valkey-mirror.infra.md new file mode 100644 index 0000000..06f8d98 --- /dev/null +++ b/docs/changelog.d/valkey-mirror.infra.md @@ -0,0 +1 @@ +Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate. diff --git a/docs/explanation/ai-scraper-mitigation.md b/docs/explanation/ai-scraper-mitigation.md deleted file mode 100644 index fe4ba3d..0000000 --- a/docs/explanation/ai-scraper-mitigation.md +++ /dev/null @@ -1,201 +0,0 @@ ---- -title: AI Scraper Mitigation -modified: 2026-06-01 -last-reviewed: 2026-06-01 -tags: - - explanation - - fly-io - - forgejo - - security - - networking ---- - -# AI Scraper Mitigation on the Public Proxy - -> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words — these serve as placeholders to establish the documentation structure. - -How BlumeOps keeps AI crawlers from running up the [[expose-service-publicly|Fly.io proxy]] egress bill and DoS-ing [[forgejo|Forgejo]] on [[indri]]. - -## The incident - -A $29.60 Fly.io invoice arrived, nearly all of it a single line: - -``` -Bandwidth: Egress (iad) — 958,524,714,138 bytes — $19.17 -``` - -The `iad` (Ashburn) region is a red herring: the proxy machine runs in `sjc`, -but Fly bills egress at the edge PoP nearest the *client*, so `iad` just means -"the traffic went to clients on the US East Coast." - -Tracing it through the nginx access logs (shipped to Loki via [[alloy|Alloy]]): - -| Signal | Value | -|--------|-------| -| Total proxy egress (30d) | ~1.25 TB | -| Share that was `forge.eblu.me` | **99.95%** | -| Share of forge egress that was `/mirrors/*` | **~71%** | -| Share that was declared AI bots | **~85%+** | -| Top offenders | Meta `meta-externalagent` (66% of bytes), OpenAI `GPTBot` (16%), Amazonbot, Bytespider | -| Forgejo `5xx` (upstream timeouts) | tens of thousands/day, spiking to 112k | - -The crawlers were walking [[forgejo|Forgejo]]'s git-history browse endpoints — -`src/commit/`, `commits/`, `blame/`, `raw/commit/`, plus `.patch`/`.diff` -and `?page=N` pagination. That URL space is effectively **infinite**: every -file × every commit × every page, multiplied across every mirrored repo. A -crawler that follows links never finishes, and every page is a cache `MISS` -that both tunnels to indri *and* bills as egress. - -Two distinct harms, not one: - -1. **Cost** — ~1.25 TB/mo of egress on a free-tier-ish proxy. -2. **Availability** — the crawl alone generates ~400–530k requests/day, - enough to time out Forgejo regardless of how much RAM [[indri]] has. Moving - egress elsewhere would *not* fix this; the crawl has to be throttled at the - source. - -`robots.txt` already `Disallow`s `/mirrors/`, `/user/`, and archive/download -paths — but **`meta-externalagent` and `GPTBot` ignore it.** For these agents, -`robots.txt` is a dead letter, which is why edge enforcement is required. - -## The tiered plan - -### Tier 1 — Black-hole `/mirrors/*` (shipped) - -The mirror repositories (`tailscale`, `prometheus`, `mealie`, `paperless-ngx`, -…) are mirrors of *already-public upstreams*, kept for supply-chain control -(see [[spork-strategy]] and the container/mirror story in [[why-gitops]]). They -are consumed by CI, gilbert, and other tailnet clients over -`forge.ops.eblu.me`. Their web UI on the public internet served **no -legitimate audience** — only scrapers. So the proxy now returns `403` for -anything under `/mirrors/`, pointing humans at the tailnet host: - -```nginx -location ^~ /mirrors/ { - return 403 "Mirror repositories are tailnet-only — use forge.ops.eblu.me.\n"; -} -``` - -The `^~` modifier matters: without it, the regex `location` blocks for static -assets (`*.css`, `*.js`, release downloads) would match first and leak content -under `/mirrors/`. `^~` tells nginx to stop at the prefix match and skip the -regex round. - -This is config, not bot-fighting — we simply stopped serving an infinite -tarpit to the world. It removes ~71% of forge egress and a large share of the -upstream timeouts, with zero impact on any human or tailnet consumer. It -mirrors the existing tailnet-only blocks for `/api/packages/` and `/swagger`. - -The `403` is also a small act of public shaming. Blocked requests are served a -"roll of dishonour" page (`fly/naughty.html`, status kept at `403` via -`error_page 403 /naughty.html`) that names the offending operators and their -share of the stolen bytes, and every response carries an `X-Naughty-Scrapers` -header: - -``` -X-Naughty-Scrapers: OpenAI/GPTBot, Meta/meta-externalagent, Amazonbot, ByteDance/Bytespider — robots.txt ignorers -``` - -Petty? A little. But it costs nothing, documents *why* the block exists for the -next person who hits it, and the page is a few KB versus the megabytes of git -HTML the crawlers were taking. - -**Trade-off accepted:** mirror release-artifact downloads over WAN now also -`403`. Legitimate consumers already pull these over the tailnet, and the public -exposure was the same crawl liability, so this is intentional. - -### Tier 2 — Defend the repos that *stay* public (planned) - -`/eblume/*` is intentionally public (a public profile is a feature). But the -same git-history endpoints are still a tarpit there, just lower-volume. Two -layers, in increasing order of effort and effectiveness: - -#### 2a. User-agent denylist (cheap, evadable) - -Block the declared AI crawlers at the edge regardless of path: - -```nginx -# Illustrative — not yet deployed. -map $http_user_agent $is_ai_bot { - default 0; - "~*meta-externalagent" 1; - "~*GPTBot" 1; - "~*ClaudeBot" 1; - "~*Amazonbot" 1; - "~*Bytespider" 1; - "~*SemrushBot" 1; -} -# in the forge.eblu.me server block: -if ($is_ai_bot) { return 403; } -``` - -This catches ~85% of *current* traffic for a few lines of config. It is -trivially evadable — a scraper need only spoof a browser UA — so it is a -speed-bump, not a wall. Keep `robots.txt` too: well-behaved crawlers -(Googlebot, Bingbot) do honor it, and it documents intent. - -#### 2b. Anubis proof-of-work gateway (the real wall) - -[Anubis](https://github.com/TecharoHQ/anubis) is a Go reverse proxy that -weighs each request with a browser-based proof-of-work challenge before passing -it upstream. It was written for *exactly this scenario* — its author built it -after Amazon's scraper took down their Git server — and is widely deployed in -front of Forgejo/Gitea (Codeberg, the UN, etc.). Headless scrapers that can't -run the challenge JS never reach the application; humans clear it once and -proceed. - -Why it fits BlumeOps better than the alternatives: - -- **It attacks cost *and* availability at once.** Bots receive a few-KB - challenge page instead of MB of git HTML (egress collapses) and never reach - Forgejo (timeouts collapse). No other single lever does both. -- **It stays in-house.** No third party terminates our TLS or sees our - traffic. - -Placement options: - -| Where | Pros | Cons | -|-------|------|------| -| On [[indri]], between [[caddy|Caddy]] and Forgejo | Protects every path and every entry (WAN *and* tailnet); one config | Adds a hop and a service to the indri critical path; the challenge page still tunnels back through Fly for WAN clients (small egress) | -| On the Fly proxy machine, in front of nginx | Challenge served at the edge — bots never even tunnel to indri | Fly VM is small (512 MB); another moving part in the boot sequence alongside `tailscaled`/nginx/`fail2ban`/Alloy | - -Leaning toward Caddy-side on indri for simplicity and uniform coverage, but -this is the open design question for Tier 2. Anubis is MIT-licensed and the -author has signalled a future move to an `equi-x`-based challenge, so pin a -version and track upstream. - -### Tier 3 — Move egress off Fly entirely (rejected) - -A [[#The incident|Cloudflare]] Tunnel (`cloudflared` on indri → Cloudflare -edge) would make this a non-problem on the cost axis: Cloudflare does not meter -proxied bandwidth, and it bundles free AI-bot mitigation (Bot Fight Mode, the -"block AI scrapers" toggle, Managed Challenge, AI Labyrinth). One move would -zero the egress bill and add bot defense. - -**We are not doing this, on principle.** Cloudflare is a solid platform and a -defensible engineering choice — but it already sits in front of an enormous -fraction of the modern web, and routing BlumeOps through it would add one more -site to the pile of the internet that one company can see and gate. BlumeOps -deliberately keeps its own backbone ([[expose-service-publicly|Fly + Tailscale -+ Caddy]], DNS at [[gandi|Gandi]] — see the "no Cloudflare dependency" line in -that doc). This is a values decision, not a technical one: we would rather pay -a few dollars and run our own mitigation than centralize on Cloudflare. - -It is also worth noting that **Tier 3 would not, by itself, fix the upstream -timeouts** — free egress just means we'd stop *caring* that bots crawl, while -they continued to hammer Forgejo. Crawl mitigation (Tier 1 + Tier 2) is -required regardless of where egress is billed. - -## Summary - -| Tier | Lever | Cost | Availability | Status | -|------|-------|------|--------------|--------| -| 1 | Black-hole `/mirrors/*` at edge | −~71% | big drop | **shipped** | -| 2a | UA denylist on remaining repos | −most of the rest | further drop | planned | -| 2b | Anubis PoW gateway | −near-total | near-total | planned | -| 3 | Cloudflare Tunnel | −total | needs 2b anyway | **rejected (principle)** | - -The guiding insight: the cheapest, lowest-risk mitigation is to **not serve an -infinite-URL surface that has no human audience.** Everything past Tier 1 is -about defending the surface we *do* want public, in-house, without ceding -control of our traffic to a third party. diff --git a/docs/explanation/compliance-mute-categories.md b/docs/explanation/compliance-mute-categories.md new file mode 100644 index 0000000..4c5f3a3 --- /dev/null +++ b/docs/explanation/compliance-mute-categories.md @@ -0,0 +1,99 @@ +--- +title: Compliance Mute Categories +modified: 2026-05-04 +last-reviewed: 2026-05-04 +tags: + - explanation + - security + - compliance +--- + +# Compliance Mute Categories + +> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure. + +How BlumeOps should categorize muted compliance findings, why a single "compensating control" tag is not enough, and what tooling work is needed to support multiple categories cleanly. + +## Why this matters + +When a compliance scanner ([[prowler]], Trivy via Prowler IaC, Kingfisher) reports a failing finding, there are three structurally different reasons we might suppress it: + +1. **Compensating control (CC)** — the requirement applies and we *do not* meet it directly, but an alternative control mitigates the same risk. +2. **Not applicable (NA)** — the requirement's preconditions cannot be satisfied in our environment, so the finding is structurally inert (e.g. a 32-bit-only CVE on 64-bit-only hosts). +3. **Risk accepted (RA)** — the requirement applies, we do not meet it, no compensating control exists, and we have explicitly chosen to accept the residual risk for a bounded period. + +Today every muted finding in BlumeOps uses the `CC: ` convention. That conflates all three categories. In a real PCI DSS or SOC2 environment, auditors treat them very differently: + +- A CC requires documentation of the constraint, the alternative measure, and recurring validation that the measure still works. +- An NA requires documentation of *why* the precondition cannot be met, with periodic verification that the environmental fact still holds. +- An RA requires an explicit decision-maker, an expiry date, and a scheduled re-decision. + +Mixing them under one tag means stale CCs hide stale RAs, and NAs that should be revisited when the environment changes get treated as permanent fixtures. + +## Trigger case: CVE-2026-31789 + +The 2026-05-03 weekly compliance review surfaced [CVE-2026-31789](https://nvd.nist.gov/vuln/detail/CVE-2026-31789), an OpenSSL heap buffer overflow during X.509 certificate processing on **32-bit systems**. Prowler's image scanner flagged 216 findings across 106 BlumeOps images carrying `libssl3` / `libcrypto3` below the fixed versions. + +The CVE is genuine, but its preconditions cannot be satisfied in our environment: indri is Apple Silicon (arm64), ringtail is x86_64, and we run no 32-bit containers. This is the canonical NA case — not a CC, because there is no "alternative measure mitigating the risk." The risk does not exist for us at all. + +A CC like `no-32bit-runtimes` would technically work, but conflates the categories: if we ever introduce a 32-bit runtime we would have to remember that this CC was load-bearing for the mute, retire or scope it down, and reopen the muted findings. An NA tag with a short justification makes the precondition explicit and self-documents the conditions under which it must be revisited. + +## Current tooling state + +Three Prowler scans run weekly. Their mute paths today: + +| Scan | Mute mechanism | File(s) | +|------|----------------|---------| +| K8s CIS (Sunday) | Prowler `--mutelist-file`, merged from ConfigMap | `argocd/manifests/prowler/mutelist/*.yaml` | +| IaC (Saturday) | Trivy `--ignorefile` shim (Prowler's `--mutelist-file` is a no-op for IaC) | `argocd/manifests/prowler/mutelist/trivyignore.yaml` | +| Container Images (Saturday) | **None — `cronjob-image-scan.yaml` does not pass `--mutelist-file`** | n/a | + +The image scan has never been wired to a mutelist. The CSV reports do contain a `MUTED` column, but it is always `False` because no mutelist is supplied. All 14k+ image findings flow through to `review-compliance-reports` unfiltered. + +The mute tag convention is consistent across the two configured scans: each entry's `Description:` (or `statement:` for trivyignore) starts with `CC: . `. `mise run review-compensating-controls` greps for those IDs to find every file that depends on each control. There is no NA tag, no RA tag, and no expiry field. + +## Proposed model + +### Tag prefixes + +Extend the description-prefix convention: + +- `CC: . ` — references an entry in `compensating-controls.yaml`. Existing convention, unchanged. +- `NA: . ` — environmental precondition fails. Reason should be specific enough that a reviewer can verify it (e.g. `NA: no 32-bit runtimes`, not `NA: doesn't apply`). +- `RA: ; expires . ` — explicit risk acceptance with a hard expiry. Past the expiry, re-review is mandatory. + +Tag choice is exclusive: a given mute is one of CC, NA, or RA. If two reasons apply, pick the strongest — CC > RA > NA. + +### Tooling changes required + +1. **Wire the image scan to a mutelist.** Add `argocd/manifests/prowler/mutelist/image-cves.yaml`, mount-and-merge it the same way `cronjob.yaml` mounts its mutelist parts, and pass `--mutelist-file` to `prowler image`. Verify experimentally that `prowler image` honors the flag — Prowler's behavior across providers is inconsistent, and the IaC provider notably does not. If `prowler image` ignores it, fall back to post-scan filtering inside `review-compliance-reports`. + +2. **Teach `review-compensating-controls` (or a sibling) to surface NA and RA entries.** CCs already get a staleness queue. NAs should appear in a separate queue keyed on the reason text — when an NA reason becomes false (e.g. we do introduce a 32-bit runtime), every NA mute citing that reason must be reopened. RAs should sort by expiry date, with anything past expiry flagged red. + +3. **Expiry parsing.** RA tags carry a hard date. The simplest path is to parse it from the description string at review time. A more durable path is to extend the mutelist YAML schema with a structured `expires:` field and a small wrapper that strips it before passing the file to Prowler. Either works; the structured field is friendlier to editors. + +### Out of scope (for now) + +- Changing the underlying Prowler mutelist YAML schema. Stay within the `Mutelist:` shape Prowler expects. +- Migrating existing `CC:` entries. The current set is genuinely CCs and should stay tagged that way. +- Building an issue-tracker integration. Todoist is the source of truth for "remember to re-review this" until that scales painfully. + +## Order of operations + +When this work is picked up, the suggested sequence is: + +1. **Scope and confirm.** Re-read this article, confirm the model still fits, adjust if not. +2. **Wire the image-scan mutelist.** Smallest atomic change; produces immediate value (the CVE-2026-31789 mute can land as the first NA entry). +3. **Add the NA convention.** Update [[read-compliance-reports]] and [[review-compensating-controls]] how-tos to describe the three tag prefixes. The convention can land before tooling supports it — review will just be manual until tooling catches up. +4. **Extend the review tools.** Add NA and RA queues to `review-compensating-controls` (or a new task). At this point, parse expiry from RA descriptions. +5. **Optionally: structured expiry.** If RA entries become common, migrate to a structured `expires:` YAML field with a wrapper that filters it out before Prowler reads the file. + +The first three steps are a coherent C1. Steps 4–5 can be split off if scope creeps. + +## Related + +- [[read-compliance-reports]] — the weekly review process this feeds into +- [[review-compensating-controls]] — current CC review tooling +- [[security-model]] — overall security posture +- [[prowler]] — scanner reference +- [[agent-change-process]] — how to scope and execute the implementation diff --git a/docs/how-to/configuration/manage-forgejo-mirrors.md b/docs/how-to/configuration/manage-forgejo-mirrors.md index 5d150dc..9c0e113 100644 --- a/docs/how-to/configuration/manage-forgejo-mirrors.md +++ b/docs/how-to/configuration/manage-forgejo-mirrors.md @@ -137,8 +137,8 @@ Return to [GitHub token settings](https://github.com/settings/tokens?type=beta) Trigger a manual sync on one mirror to confirm the new PAT works: -1. Go to any mirror repo's settings page on forge (e.g., `https://forge.eblu.me/mirrors/cloudnative-pg/settings`) -2. In the "Mirror settings" section, click "Synchronize now" +1. Go to any mirror repo on forge (e.g., `mirrors/cloudnative-pg`) +2. Click the sync button (circular arrows icon) next to the mirror status 3. Confirm the sync completes without errors ## Related diff --git a/docs/how-to/configuration/rotate-fly-deploy-token.md b/docs/how-to/configuration/rotate-fly-deploy-token.md index 9abe5f0..5863f54 100644 --- a/docs/how-to/configuration/rotate-fly-deploy-token.md +++ b/docs/how-to/configuration/rotate-fly-deploy-token.md @@ -14,7 +14,7 @@ How to rotate the Fly.io API token used to deploy [[flyio-proxy]]. The token liv ## When to rotate -- Every 75 days (heph recurring task) +- Every 75 days (Todoist recurring task) - After any compromise / accidental disclosure - If `fly deploy` starts returning auth errors diff --git a/docs/how-to/configuration/rotate-gandi-pat.md b/docs/how-to/configuration/rotate-gandi-pat.md index 5ce6f81..94a0b4e 100644 --- a/docs/how-to/configuration/rotate-gandi-pat.md +++ b/docs/how-to/configuration/rotate-gandi-pat.md @@ -14,7 +14,7 @@ How to rotate the Gandi Personal Access Token. **One PAT** is shared by [[caddy] ## When to rotate -- Every 60 days (heph recurring task) +- Every 60 days (Todoist recurring task) - After any compromise / accidental disclosure - Whenever Gandi starts rejecting the PAT (see [Debugging](#debugging)) diff --git a/docs/how-to/immich/cnpg-on-ringtail.md b/docs/how-to/immich/cnpg-on-ringtail.md deleted file mode 100644 index 153e674..0000000 --- a/docs/how-to/immich/cnpg-on-ringtail.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -title: CNPG Operator on Ringtail -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - postgres - - ringtail ---- - -# CNPG Operator on Ringtail - -Bring up the `cloudnative-pg` operator on `k3s-ringtail`. Today the -operator only exists on `minikube-indri` (see -`argocd/apps/cloudnative-pg.yaml`, destination `kubernetes.default.svc`). - -Prerequisite of [[migrate-immich-to-ringtail]]; consumed by -[[immich-pg-on-ringtail]]. - -## What to do - -- Add a sibling `argocd/apps/cloudnative-pg-ringtail.yaml` pointing - at the same mirror (`mirrors/cloudnative-pg`, tag `v1.27.1`), - destination `https://ringtail.tail8d86e.ts.net:6443`, - namespace `cnpg-system`. -- Mirror the `ServerSideApply=true` and `CreateNamespace=true` sync - options (the CRDs exceed the annotation size limit). -- Sync `apps` then `cloudnative-pg-ringtail`. Verify the operator - pod is running on ringtail. - -## Verification - -```fish -kubectl --context=k3s-ringtail -n cnpg-system get pods -kubectl --context=k3s-ringtail get crd clusters.postgresql.cnpg.io -``` - -## Why a separate app - -Each ArgoCD app targets a single cluster via `destination.server`. -We could parameterize with ApplicationSets, but blumeops' convention -is to duplicate the manifest with a `-ringtail` suffix (see -`alloy-ringtail`, `external-secrets-ringtail`, etc.). Keep the -convention. - -## Out of scope - -- Postgres clusters themselves (`immich-pg`, etc.) — those come from - [[immich-pg-on-ringtail]]. -- Removing the minikube cnpg operator. That happens at the very end - of the indri-k8s decommission, not in this chain. diff --git a/docs/how-to/immich/immich-app-on-ringtail.md b/docs/how-to/immich/immich-app-on-ringtail.md deleted file mode 100644 index 51b619d..0000000 --- a/docs/how-to/immich/immich-app-on-ringtail.md +++ /dev/null @@ -1,91 +0,0 @@ ---- -title: Immich App on Ringtail -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - immich ---- - -# Immich App on Ringtail - -Bring up `immich-server`, `immich-machine-learning`, and -`immich-valkey` on ringtail. This card stands the stack up against -the *new* pg cluster — it does not move user traffic. Cutover lives -in [[immich-cutover-and-decommission]]. - -## What to do - -- New manifest dir `argocd/manifests/immich-ringtail/` (the suffix - matches the `-ringtail` convention used by other apps). Port from - `argocd/manifests/immich/`: - - `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail - pg service. - - `deployment-ml.yaml` — use `runtimeClassName: nvidia` + a - `resources.limits` for `nvidia.com/gpu: 1`. Use the `-cuda` tag - of the immich-ml image (set in kustomization). Ringtail is - single-node, so no node selector needed. See - `argocd/manifests/frigate/` for the existing GPU pod pattern. - - **GPU contention discovery:** ringtail's `nvidia-device-plugin` - is configured with `timeSlicing.replicas: 2`. Frigate + Ollama - already consume both virtual slices. Adding immich-ml requires - bumping the count to >= 3. Edit - `argocd/manifests/nvidia-device-plugin/configmap.yaml` (or - wherever the device-plugin config lives) and re-sync the - `nvidia-device-plugin` ArgoCD app. The plugin pod restarts and - the new advertised count appears as the node's - `nvidia.com/gpu` allocatable. - - `deployment-valkey.yaml` — straight port, BUT use the upstream - multi-arch `docker.io/valkey/valkey:` image — do NOT - use the `registry.ops.eblu.me/blumeops/valkey` rewrite in the - kustomization. That mirror was built on indri (arm64) and is - single-arch; pulling it on ringtail (amd64) gets `exec format - error` in CrashLoopBackOff. The mirror should eventually carry - a multi-arch tag, at which point the rewrite can return. - - `service*.yaml` — straight port. - - `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC). - - `pv-nfs.yaml` + `pvc.yaml` — already covered by - [[sifaka-nfs-from-ringtail]] (may live in this dir or theirs). - - `ingress-tailscale.yaml` — ProxyGroup ingress, **must not** set - an explicit `host:` (or use `host: *`) per the lesson on - ProxyGroup VIP routing. - **Hostname collision warning:** the minikube ingress claims the - Tailscale device name `photos` (`tls.hosts: [photos]`). Two - devices on the tailnet cannot share that name. While the - ringtail deployment is being staged it must use a *different* - `tls.hosts` value (e.g. `photos-ringtail`) so it can coexist - with the running minikube one. The flip to `photos` happens at - cutover time, *after* the minikube ingress has been removed. - See [[immich-cutover-and-decommission#Cutover sequence]]. - - `kustomization.yaml` — same `images:` block (server, ML, valkey). -- New ArgoCD app `argocd/apps/immich-ringtail.yaml` targeting - ringtail, namespace `immich`. **Manual sync only** until the - cutover. -- Existing `argocd/apps/immich.yaml` (minikube) stays untouched - during this card — both apps exist briefly. - -## Bring it up against a copy of the DB - -Use the throwaway/test path from [[immich-pg-data-migration#Dry run -before real cutover]]: point the ringtail immich at the *test* pg -cluster first, verify the pod boots, the web UI loads (via -`kubectl port-forward`), assets list, ML embeddings query. Then -tear it down. - -## Verification - -- All three pods Ready. -- ML pod has a GPU attached: `nvidia-smi` inside the container shows - the 4080. -- `immich-server` connects to pg and valkey (no `ECONNREFUSED` in - logs). -- A `kubectl port-forward` to the server service shows the Immich - web UI. - -## Out of scope - -- Public/tailnet routing flip. Caddy still points at the minikube - Tailscale ingress until [[immich-cutover-and-decommission]]. -- Removing the minikube immich. Same. diff --git a/docs/how-to/immich/immich-cutover-and-decommission.md b/docs/how-to/immich/immich-cutover-and-decommission.md deleted file mode 100644 index b44fddd..0000000 --- a/docs/how-to/immich/immich-cutover-and-decommission.md +++ /dev/null @@ -1,103 +0,0 @@ ---- -title: Immich Cutover and Decommission -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - immich - - migration ---- - -# Immich Cutover and Decommission - -The user-visible flip. By the time this card opens, the ringtail -stack has been proven against a copy of the data. This card does the -real cutover. - -## Pre-cutover checklist - -- [[immich-pg-data-migration]] dry-run succeeded; method is chosen. -- Ringtail immich stack has been brought up against the test pg, - pods healthy, UI loaded ([[immich-app-on-ringtail#Verification]]). -- Borgmatic just ran successfully (a fresh nightly archive is a - belt-and-suspenders fallback, on top of the live source pg). -- User has been told to stop uploading from the iOS app for the - cutover window. - -## Cutover sequence - -1. **Quiesce source.** `kubectl --context=minikube-indri -n immich - scale deploy/immich-server --replicas=0` and same for ML. Leave - valkey + pg running. Confirm no client traffic on the source pg - via `pg_stat_activity`. -2. **Tear down the minikube Tailscale ingress.** The `photos` - Tailscale device name must be freed before ringtail's ingress can - claim it (Tailscale enforces uniqueness across the tailnet). - `kubectl --context=minikube-indri -n immich delete ingress - immich-tailscale` and wait for the corresponding `tailscale`-LB - StatefulSet pod to terminate. Verify the `photos` device is gone: - `tailscale status | grep -i photos` from any tailnet host. -3. **Final sync.** Per chosen method in - [[immich-pg-data-migration]]: - - Option A: promote the ringtail replica. - - Option B: take final `pg_dump`, restore to ringtail - `immich-pg`. -4. **Verify.** Run the row-count and schema-diff checks from - [[immich-pg-data-migration#Verification on the real run]]. -5. **Flip the ringtail ingress to `photos`.** Update - `argocd/manifests/immich-ringtail/ingress-tailscale.yaml`: - `tls.hosts: [photos]` (was `[photos-ringtail]` during staging per - [[immich-app-on-ringtail]]). Commit, `argocd app sync - immich-ringtail`. Wait for the `photos` device to register on the - tailnet again. -6. **Bring up ringtail immich** against the now-promoted pg - (`argocd app sync immich-ringtail`). Wait for Ready. -7. **Flip routing.** Update Caddy on indri - (`ansible/roles/caddy/defaults/main.yml`): `photos.ops.eblu.me` - upstream changes to the ringtail Tailscale ingress hostname - (`photos` — same MagicDNS name, now pointing to the ringtail - proxy). `mise run provision-indri -- --tags caddy`. -8. **Smoke test.** Open `photos.ops.eblu.me` in a browser. Sign in. - Scroll the timeline. Open an album. Trigger an ML search. -9. **Update borgmatic.** If the Tailscale hostname for pg changed, - update `borgmatic.cfg` on indri to point at the ringtail - `immich-pg-tailscale` service. Run a manual backup to verify. - -## After cutover - -- `argocd app set immich --revision ` is no longer relevant; - the minikube `immich` app gets deleted entirely. -- Delete `argocd/apps/immich.yaml`, `argocd/manifests/immich/`, and - the minikube `argocd/manifests/databases/immich-pg.yaml` + - `external-secret-immich-borgmatic.yaml` + - `service-immich-pg-tailscale.yaml`. -- Rename `immich-ringtail` back to `immich` (the `-ringtail` suffix - was scaffolding for the dual-cluster window; once minikube is - empty of immich, the unsuffixed name is clean). -- Confirm the minikube `immich-pg` PVC is no longer used, then - delete it (the PV with `Retain` policy will persist — clean that - up too). - -## Verification (definition of done) - -- `photos.ops.eblu.me` works for a real session, including ML search. -- Source minikube has no `immich` pods, no `immich-pg`, no PVCs. -- Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check - `docker stats minikube` on indri. -- Nightly borgmatic run after the cutover completes successfully, - with the immich-pg archive showing the new source. - -## Rollback (within the cutover window) - -If smoke test fails: flip Caddy back, scale ringtail immich to 0, -scale source immich back up. Source pg was never destroyed. File a -plan reset on the relevant prerequisite card and try again next -session. - -## Out of scope - -- Decommissioning all of minikube. This chain just removes immich. - Other tenants migrate in their own chains as part of the broader - indri-k8s decommission. See [[migrate-immich-to-ringtail]] for - context. diff --git a/docs/how-to/immich/immich-pg-data-migration.md b/docs/how-to/immich/immich-pg-data-migration.md deleted file mode 100644 index fb87783..0000000 --- a/docs/how-to/immich/immich-pg-data-migration.md +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: Immich Postgres Data Migration -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - postgres - - immich - - critical ---- - -# Immich Postgres Data Migration - -**This is the data-loss surface of the migration.** Pick a method, -prove it on a throwaway copy first, then run the real cutover. - -## Decision: pick one - -### Option A — CNPG `externalCluster` bootstrap (preferred) - -Stand the ringtail cluster up as a streaming replica of the minikube -cluster via `bootstrap.pg_basebackup.source`. Replica catches up -online; when ready, promote it and point Immich at it. This is -CNPG's documented PG-to-PG migration path and gives near-zero data -loss (the WAL position at promote == the position at app stop). - -Requires: network path from ringtail to minikube's pg over the -tailnet (the existing `immich-pg-tailscale` Service works), and a -superuser secret minikube-side exposed to ringtail's basebackup. - -Pitfall to plan around: the ringtail Cluster CR will need its -`bootstrap` block rewritten *after* promotion (CNPG doesn't -gracefully drop the externalCluster reference). Account for this in -[[immich-pg-on-ringtail]] — it may force a reset of that card. - -### Option B — pg_dump / pg_restore - -Stop immich, `pg_dump -Fc` from minikube, scp to ringtail, restore. -Simpler but full downtime for the whole dump+restore window -(measure on a copy first — VectorChord indexes are slow to rebuild). -Smaller blast radius; no streaming-replication moving parts. - -Use this if Option A hits any blocker. Data loss should still be -zero if the source is stopped first. - -### Option C — leave pg on minikube - -Rejected. See goal card [[migrate-immich-to-ringtail#Why postgres on -ringtail (not cross-cluster)]]. - -## Dry run before real cutover - -Whichever option wins: - -1. Snapshot the minikube `immich-pg` PVC or take a fresh `pg_dump` - into a scratch location. -2. Restore into a *separate* ringtail CNPG cluster (different name, - e.g. `immich-pg-test`) and point a scratch immich-server pod at - it. -3. Verify: pod boots, can list assets, ML embeddings query without - error, face thumbnails render. VectorChord-backed queries should - not error. -4. Tear the scratch cluster down before doing the real one. - -## Verification on the real run - -- Row counts match for `assets`, `albums`, `users`, `face`, - `asset_face`, `smart_search` (the embedding table) — script this. -- `pg_dump --schema-only --no-owner` diff between source and dest - should be empty modulo CNPG-managed roles. -- Immich `/api/server-info/version` and `/api/server-info/statistics` - return sane numbers. - -## Rollback - -If the cutover fails verification: stop the ringtail immich, repoint -ArgoCD `immich.destination` back to minikube, re-sync. Source pg was -never deleted. Document what failed and reset the chain. diff --git a/docs/how-to/immich/immich-pg-on-ringtail.md b/docs/how-to/immich/immich-pg-on-ringtail.md deleted file mode 100644 index 10c7072..0000000 --- a/docs/how-to/immich/immich-pg-on-ringtail.md +++ /dev/null @@ -1,69 +0,0 @@ ---- -title: Immich Postgres Cluster on Ringtail -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - postgres - - immich ---- - -# Immich Postgres Cluster on Ringtail - -Stand up a fresh `immich-pg` CNPG Cluster on ringtail, ready to receive -data. **No data import yet** — that's [[immich-pg-data-migration]]. - -## What to do - -- Create `argocd/manifests/databases-ringtail/` (or pick another - namespace name — verify what other ringtail pg clusters will use; - if none yet, `databases` is fine). -- Port these from the minikube side: - - `immich-pg.yaml` — CNPG Cluster CR. Same image - (`ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0`), same - extensions, same managed `borgmatic` role. Bump `storage.size` if - the minikube 10 GiB looks tight (check actual usage first). - `storageClass: local-path` on ringtail (default). - - `external-secret-immich-borgmatic.yaml` — same 1Password item, - same field, but referencing the ringtail `ClusterSecretStore` - (`onepassword-blumeops` already exists per the - `external-secrets-ringtail` app). - - Service for in-cluster access (the operator creates `immich-pg-rw` - etc. automatically; verify the app deployment uses those names). - - A Tailscale Service if we want backups to keep working via the - same hostname during the transition — see "Borgmatic" below. -- New ArgoCD app `argocd/apps/databases-ringtail.yaml` pointing at - the new path, destination ringtail. - -## Verification - -- Cluster reaches `Ready`. -- `borgmatic` role exists, `rolcanlogin=t`, and is a member of - `pg_read_all_data` (via `managed.roles[].inRoles`). -- ExternalSecret `immich-pg-borgmatic` syncs from 1Password - (`Ready: True`) and the rendered Secret has `username=borgmatic`. -- The `vchord`, `vector`, `cube`, `earthdistance` extensions show - installed in the `postgres` database (`\dx` from - `psql -U postgres`). They are NOT installed in the `immich` - database at this point — `postInitSQL` in CNPG's `initdb` block - runs against the `postgres` superuser database. The Immich app - itself creates the extensions in its own `immich` database at - startup; do not be alarmed by their absence pre-immich-deploy. - The `vchord.so` library is preloaded via - `shared_preload_libraries` regardless, so `CREATE EXTENSION` at - app startup just registers it in the right database. - -## Borgmatic implications - -`borgmatic.cfg` on indri targets `immich-pg-tailscale` over the -tailnet. During migration both clusters will exist briefly. Decide -upfront: backup the *source* pg until cutover, then flip borgmatic -to the ringtail Tailscale service. Document the flip in -[[immich-cutover-and-decommission]]. - -## Out of scope - -- Importing data. That is [[immich-pg-data-migration]], which may - drive a reset on this card if the migration approach (e.g. CNPG - `externalCluster` bootstrap) requires changes to this Cluster CR. diff --git a/docs/how-to/immich/migrate-immich-to-ringtail.md b/docs/how-to/immich/migrate-immich-to-ringtail.md deleted file mode 100644 index e654b62..0000000 --- a/docs/how-to/immich/migrate-immich-to-ringtail.md +++ /dev/null @@ -1,134 +0,0 @@ ---- -title: Migrate Immich to Ringtail -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - immich - - migration ---- - -# Migrate Immich to Ringtail - -Move the entire Immich stack (server, ML, valkey, postgres) off -`minikube-indri` and onto `k3s-ringtail`. This is the first concrete -chain in the broader indri-k8s decommission: minikube is -memory-saturated (97% RAM, swapping), and Immich is the single -largest tenant (~1.5 GiB resident). - -## End state - -- Immich `server`, `machine-learning`, and `valkey` Deployments run on - ringtail k3s in the `immich` namespace. -- The `immich-machine-learning` pod uses ringtail's RTX 4080 via the - `nvidia-device-plugin` (performance win — currently CPU-only on - minikube). -- A CNPG `immich-pg` Cluster (PostgreSQL 17 + VectorChord) runs in a - `databases` namespace on ringtail, owned by the `cnpg-system` - operator on ringtail. -- The photo library still lives on [[sifaka]] at `/volume1/photos`, - mounted via NFS from ringtail pods (RWX). -- Routing: `photos.ops.eblu.me` (Caddy on indri) proxies to a - Tailscale ProxyGroup ingress on ringtail. No public surface today. -- The ArgoCD `immich` app's `destination.server` points at - `https://ringtail.tail8d86e.ts.net:6443`. The old minikube - manifests are removed. - -## Non-goals - -- Public exposure via Fly. Immich stays tailnet-only. -- Changing the immich version or runtime configuration. This is a - lift-and-shift; bumps come later. -- Backing up to a different target. [[borgmatic]] keeps running on - indri (it pulls via Tailscale and uses sifaka SMB for the library). - -## Critical constraint: no data loss - -Downtime is acceptable (Immich is a single-user system; we can take -it offline for the cutover). **Data loss is not.** Two surfaces matter: - -1. **Postgres** — face data, ML embeddings (vectors), album state, - sharing, etc. Re-derivable in theory; weeks of recompute in - practice. See [[immich-pg-data-migration]]. -2. **Library files** — `/volume1/photos`. Not moving, but the NFS - path must be verified accessible from ringtail before cutover. - See [[sifaka-nfs-from-ringtail]]. - -[[borgmatic]] backs both up to sifaka + BorgBase nightly; restore is -possible but slow. Treat it as a fallback, not a plan. - -## Why postgres on ringtail (not cross-cluster) - -`immich-pg` already has a Tailscale Service we could point ringtail -at, leaving the DB on minikube. We're not doing that because: - -- The whole goal is to retire minikube — keeping pg there blocks it. -- Immich is chatty against pg; tailnet round-trips would hurt. -- CNPG is the same operator on both sides — a Cluster CR on ringtail - is mechanically equivalent. - -## Approach - -This is a C2 Mikado chain. The prerequisite cards each represent a -distinct surface that has to work before cutover. See -[[agent-change-process#C2 — Mikado Chain]] for the discipline. - -## Workflow note: registering new ArgoCD apps during the chain - -This chain adds three new ArgoCD `Application` definitions in -`argocd/apps/`: `cloudnative-pg-ringtail`, `databases-ringtail`, -and (later) `immich-ringtail`. The usual C1/C2 pattern of -`argocd app set --revision && argocd app sync ` -does NOT work for the app-of-apps `apps` Application itself, because -`apps` self-manages: it re-reads `apps.yaml` (which declares -`targetRevision: main`) on every sync and reverts the override. As a -result, new app definitions added on a feature branch are never -visible to the cluster via `apps`. - -**Use `kubectl apply` to register each new Application directly:** - -```fish -kubectl --context=minikube-indri apply -f argocd/apps/.yaml -``` - -This creates the Application resource out-of-band, bypassing `apps`. - -For apps whose source lives in **this** repo (e.g. -`databases-ringtail`, `immich-ringtail` — manifest paths exist only -on the branch until merge), follow the apply with a branch override: - -```fish -argocd app set --revision mikado/migrate-immich-to-ringtail -argocd app sync -``` - -For apps whose source is an **external** repo at a pinned tag (e.g. -`cloudnative-pg-ringtail` → `mirrors/cloudnative-pg` `v1.27.1`), no -override is needed — the source revision is independent of this PR. - -After PR merge: - -```fish -argocd app set --revision main -argocd app sync -``` - -`apps` itself, on its next sync from `main`, will discover the new -Application definitions in `argocd/apps/` and adopt the already-running -resources without disruption — provided their in-cluster spec matches -the on-disk definitions (which it does because we applied the same -file). - -## Related - -- [[migrate-wave1-ringtail]] — the next chain in the indri-k8s - decommission: paperless, teslamate, and mealie -- [[shower-on-ringtail]] — a previous migration to ringtail (simpler: - no upstream cluster, SQLite, no GPU) -- [[connect-to-postgres]] — getting a psql session against CNPG -- [[ringtail]] — the target cluster -- [[cnpg-on-ringtail]], [[immich-pg-on-ringtail]], - [[immich-pg-data-migration]], [[sifaka-nfs-from-ringtail]], - [[immich-app-on-ringtail]], [[immich-cutover-and-decommission]] — - the prerequisite cards diff --git a/docs/how-to/immich/sifaka-nfs-from-ringtail.md b/docs/how-to/immich/sifaka-nfs-from-ringtail.md deleted file mode 100644 index 2c490c1..0000000 --- a/docs/how-to/immich/sifaka-nfs-from-ringtail.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: Sifaka NFS Photos from Ringtail -modified: 2026-05-13 -last-reviewed: 2026-05-13 -tags: - - how-to - - operations - - storage - - nfs - - sifaka ---- - -# Sifaka NFS Photos from Ringtail - -The Immich library lives at `sifaka:/volume1/photos` and is mounted -into the pod via an NFS PV (see `argocd/manifests/immich/pv-nfs.yaml`). -That PV is currently scoped to indri. We need ringtail to mount the -same path with the same RWX semantics, without breaking the existing -indri mount during the transition. - -## What to verify / do - -- Check `sifaka` DSM NFS rules for the `photos` share. Per - [[shower-on-ringtail#NFS + SMB share on sifaka]] convention, rules - use `192.168.1.0/24` + `100.64.0.0/10` with - `all_squash`/`Map all users to admin`. The existing rule may - already cover ringtail (it's on `192.168.1.21` per the recent - static-IP pin). If so this card is a verification card. -- If the rule is locked to indri's IP: add an entry for ringtail - (192.168.1.21) or widen to the subnet pattern above. -- Test mount from a ringtail debug pod (busybox or alpine with - nfs-utils) against the `photos` share. Read a file. Write a temp - file. Delete it. -- Watch for the known sifaka NFS-over-Tailscale gotcha: sifaka's - Tailscale must be in TUN mode (not userspace) for NFS to work - reliably over the tailnet. The NFS path here goes over the LAN - (not tailnet), so this shouldn't bite, but worth confirming the - NFS traffic is on `192.168.1.x` not `100.x`. - -## PV + PVC on ringtail - -- New `pv-nfs.yaml` mirroring the minikube one (name can be shared - if the PV is cluster-scoped — but PVs are per-cluster, so just - duplicate). Same `server: sifaka`, same path, same - `accessModes: [ReadWriteMany]`, `persistentVolumeReclaimPolicy: - Retain`. -- New `pvc.yaml` in the ringtail `immich` namespace bound to it. -- The minikube PVC stays bound and active until cutover — both - clusters can have the share NFS-mounted simultaneously (NFS RWX - permits this). Immich itself must not be running on both sides - at once. - -## Verification - -- A pod on ringtail can `ls /mnt/photos/` and see the same files - as the indri pod. -- File written from ringtail pod is visible from indri pod and - vice versa (proves there's no caching surprise). - -## Out of scope - -- Migrating photo files. Nothing moves; this is just adding a second - NFS client. -- The `pvc-ml-cache.yaml` PVC (a separate ML model cache). That's - not on NFS — it's a regular PVC. Recreated empty on ringtail in - [[immich-app-on-ringtail]]; the first ML pod boot will repopulate - it. diff --git a/docs/how-to/operations/read-compliance-reports.md b/docs/how-to/operations/read-compliance-reports.md index e676ad5..75fd3ab 100644 --- a/docs/how-to/operations/read-compliance-reports.md +++ b/docs/how-to/operations/read-compliance-reports.md @@ -80,7 +80,7 @@ Not all failures require action. Common expected failures in our minikube cluste 1. **Triage** — review new failures, distinguish real issues from expected noise 2. **Remediate** — fix what you can (pod security contexts, RBAC tightening) -3. **Mutelist** — suppress expected/accepted failures by adding a Resource entry under the matching Check in `argocd/manifests/prowler/mutelist/*.yaml` with a free-form `Description` explaining why +3. **Mutelist** — suppress expected/accepted failures via Prowler's `--mutelist-file` to reduce noise in future scans 4. **Track** — compare reports over time to spot regressions ## Related diff --git a/docs/how-to/operations/record-review-evidence.md b/docs/how-to/operations/record-review-evidence.md new file mode 100644 index 0000000..9de4e37 --- /dev/null +++ b/docs/how-to/operations/record-review-evidence.md @@ -0,0 +1,50 @@ +--- +title: Record Review Evidence +modified: 2026-04-01 +last-reviewed: 2026-04-01 +tags: + - how-to + - security + - compliance +--- + +# Record Review Evidence + +How review evidence *would* be captured after a [[review-compensating-controls|compensating control review]], to make the review auditable under a compliance framework. + +blumeops does not currently collect review evidence. This card documents the target process for reference and practice. + +## Why Record Evidence? + +Reviewing a control and updating `last-reviewed` proves the review *happened* but not *what was checked*. Under frameworks like PCI DSS v4.0, a QSA needs to see dated, immutable evidence that the reviewer verified the control and that an appropriate party accepted the residual risk. Compliance platforms like Drata automate this collection, but the underlying artifacts are the same whether you use a platform or a directory of files. + +## What Evidence Would Be Captured + +For each control reviewed, artifacts should answer: + +1. **Who reviewed it** — reviewer name, date +2. **What was verified** — the specific checks performed (e.g., Tailscale ACL policy snapshot, `tailscale status` output, kubectl auth checks) +3. **What was found** — the outcome: control still in effect, circumstances changed, or control invalidated +4. **Residual risk** — what the control does *not* cover (the gap a QSA will ask about) +5. **Acceptance** — formal sign-off that the residual risk is accepted by an appropriate party (reviewer + approver, typically a manager or CTO) + +Supporting artifacts would include command output, policy snapshots, screenshots, or API responses — anything that demonstrates the verification was actually performed. + +## PCI DSS Context + +Under PCI DSS v4.0, compensating controls require a **Compensating Control Worksheet (CCW)** that maps each control to the original requirement it substitutes for. The CCW fields are: + +- **Original requirement** — the specific PCI DSS requirement not directly met +- **Constraint** — why direct compliance isn't feasible +- **Compensating control definition** — what is done instead +- **Risk addressed** — how the control mitigates the original threat +- **Residual risk** — what remains unmitigated +- **Validation procedure** — steps to verify (what `notes` captures in `compensating-controls.yaml`) + +Req 12.3.2 mandates review **at least annually** (quarterly is typical for Level 1 Service Providers). In a platform like Drata, these map to Controls with uploaded Evidence and review workflows requiring sign-off from both the reviewer and an approver. + +## Related + +- [[review-compensating-controls]] — The technical review process +- [[security]] — Security posture overview +- [[read-compliance-reports]] — Interpreting Prowler/Kingfisher reports diff --git a/docs/how-to/operations/review-compensating-controls.md b/docs/how-to/operations/review-compensating-controls.md new file mode 100644 index 0000000..8a32d98 --- /dev/null +++ b/docs/how-to/operations/review-compensating-controls.md @@ -0,0 +1,80 @@ +--- +title: Review Compensating Controls +modified: 2026-03-30 +last-reviewed: 2026-03-30 +tags: + - how-to + - security + - maintenance +--- + +# Review Compensating Controls + +How to periodically review compensating controls that justify suppressed security findings. + +## Review by Staleness + +Show controls sorted by when they were last reviewed (most stale first): + +```bash +mise run review-compensating-controls +``` + +This reads `compensating-controls.yaml` (repo root), sorts by `last-reviewed`, and displays the most stale control with all codebase references. It also searches for every file that references the control ID, so you can see exactly which suppressed findings depend on it. + +To show more entries: + +```bash +mise run review-compensating-controls --limit 20 +``` + +## What is a Compensating Control? + +A compensating control is a security measure that mitigates the risk a finding was designed to detect, when the finding itself cannot be directly remediated. For example: + +- **Finding:** API server does not enable AlwaysPullImages admission plugin +- **Risk:** Untrusted users could run pods using cached images they shouldn't have access to +- **Compensating control:** `single-user-cluster` — only the operator has kubectl access; no untrusted users can create pods + +Controls are documented in `compensating-controls.yaml` and referenced from security tool configurations (Prowler mutelist files, Kingfisher config, etc.) using the format `CC: `. + +A compensating control is only one of three structurally distinct ways to suppress a finding — see [[compliance-mute-categories]] for when to reach for a CC versus a not-applicable (`NA:`) or risk-accepted (`RA:`) tag instead. + +## Review Process + +For each control up for review: + +1. **Understand the risk.** Read each suppressed finding that references this control. What attack or misconfiguration does the original check guard against? + +2. **Verify the control is in effect.** Follow the verification steps in the control's `notes` field. For example, for `tailscale-network-isolation`, check that the cluster is not directly internet-exposed and Tailscale ACLs are enforced. + +3. **Assess whether the control actually mitigates the risk.** A compensating control should address the same threat the check was designed to catch, not just be a vaguely related security measure. If it doesn't hold up, either: + - Fix the underlying finding and remove the suppression + - Document a stronger or more specific compensating control + +4. **Check for changed circumstances.** Has the cluster gained new users? Has a service been exposed publicly? Has an operator added native support for the missing feature? Any of these could invalidate the control. + +5. **Update the review date.** Edit `compensating-controls.yaml` and set `last-reviewed` to today's date. Commit alongside any changes. + +## Adding a New Control + +When suppressing a new security finding, either map it to an existing control or add a new one: + +```yaml +- id: my-new-control + description: >- + What this control does and how it mitigates the specific risk. + created: 2026-03-30 + last-reviewed: 2026-03-30 + notes: >- + How to verify this control is still in effect. +``` + +Then reference it in the suppression configuration with `CC: my-new-control`. + +## Related + +- [[record-review-evidence]] — Capturing evidence artifacts for audit (aspirational) +- [[security]] — Security posture overview +- [[read-compliance-reports]] — Accessing and interpreting Prowler reports +- [[review-services]] — Periodic service version review (similar staleness pattern) diff --git a/docs/how-to/operations/run-1password-backup.md b/docs/how-to/operations/run-1password-backup.md index 0dc9ec9..b0807da 100644 --- a/docs/how-to/operations/run-1password-backup.md +++ b/docs/how-to/operations/run-1password-backup.md @@ -26,17 +26,19 @@ How to export and encrypt your 1Password vaults for inclusion in [[borgmatic]] b 1. Open the 1Password desktop app 2. **File > Export > All Vaults** 3. Choose **1PUX** format -4. Save to `~/Documents/` — 1Password names the file `1PasswordExport--.1pux` automatically; don't bother renaming it, pass the path to the task in the next step +4. Save to `~/Documents/1Password-export.1pux` ### 2. Run the Backup Task -Pass the exported file's path: - ```fish -mise run op-backup ~/Documents/1PasswordExport-*.1pux +mise run op-backup ``` -(If only one export exists in `~/Documents/`, the glob expands cleanly. Otherwise, paste the full path.) +Or, if you saved the export to a non-default location: + +```fish +mise run op-backup ~/path/to/export.1pux +``` The task will: diff --git a/docs/how-to/ringtail/migrate-wave1-ringtail.md b/docs/how-to/ringtail/migrate-wave1-ringtail.md deleted file mode 100644 index ffb8cdc..0000000 --- a/docs/how-to/ringtail/migrate-wave1-ringtail.md +++ /dev/null @@ -1,176 +0,0 @@ ---- -title: Migrate Wave 1 (paperless, teslamate, mealie) to Ringtail -modified: 2026-06-03 -last-reviewed: 2026-06-03 -tags: - - how-to - - operations - - ringtail - - migration ---- - -# Migrate Wave 1 to Ringtail - -Move paperless, teslamate, and mealie off `minikube-indri` and onto -`k3s-ringtail`. This is the load-shedding response to minikube going -OOM: the kernel OOM killer was thrashing the 8 GiB node — killing -`kube-apiserver`, `dockerd`, and the argocd application-controller — -which made every minikube-hosted service probe-flap at once. These -three app pods are ~1.1 GiB resident combined and are the heaviest -non-observability tenants left on minikube. Following -[[migrate-immich-to-ringtail]], the first chain in the indri-k8s -decommission. - -## End state - -- `paperless`, `teslamate`, and `mealie` run on ringtail k3s in their - own namespaces, off minikube entirely. -- A CNPG `blumeops-pg` Cluster runs in a `databases` namespace on - ringtail (PostgreSQL, owned by ringtail's `cnpg-system` operator), - holding the `paperless` and `teslamate` databases. Apps reach it - in-cluster via `blumeops-pg-rw.databases.svc.cluster.local`. -- mealie keeps its SQLite database; its 2 GiB `mealie-data` PVC is - copied to a ringtail PVC. -- paperless media still lives on [[sifaka]] via NFS (RWX, 500 GiB), - mounted from ringtail pods. teslamate has no file state. -- Routing: `paperless.ops.eblu.me`, `teslamate.ops.eblu.me`, and - `mealie.ops.eblu.me` (Caddy on indri) proxy to Tailscale - ProxyGroup ingresses on ringtail. Service names are unchanged. -- The minikube manifests and the `paperless`/`teslamate`/`mealie` - databases inside indri's `blumeops-pg` are removed only after - cutover is verified. - -## Non-goals - -- Migrating the rest of `blumeops-pg` (e.g. miniflux) — that is a - later wave. This chain moves only the paperless + teslamate - databases out; the source cluster on indri stays up for the others. -- Version bumps or config changes. Lift-and-shift only. -- Public (Fly) exposure changes. These stay tailnet-only. -- The observability stack (prometheus/loki/tempo/grafana) — deferred; - it carries 50 GiB of local TSDB and is the riskiest move. - -## Critical constraint: no data loss - -**Downtime is acceptable — data loss is not.** We can take each -service fully offline for its cutover, which removes the entire -class of streaming-replication and double-writer hazards. The cold -dump is taken from a *quiesced* source, so it is internally -consistent. - -Data surfaces: - -1. **paperless postgres** — document metadata, tags, correspondents, - the search index state. The document *files* are on NFS and never - move, but losing the DB means files-without-index. This is the - surface to protect most carefully. -2. **teslamate postgres** — drive/charge history. Re-derivable only - from Tesla's API for a limited window; treat as unrecoverable. -3. **mealie SQLite** — recipes, meal plans. On the `mealie-data` PVC. - -The source databases on indri are **never dropped until the ringtail -side is verified and serving**. Rollback is "repoint and scale back -up," not "restore from backup." [[borgmatic]] remains the backstop. - -## Why a fresh CNPG cluster (not cross-cluster pg) - -indri's `blumeops-pg` is already exposed tailnet-wide at -`pg.ops.eblu.me` (Caddy L4), so we *could* leave the DBs on indri and -just move the app pods. We are not, because: - -- The goal is to retire minikube — keeping pg there blocks it and - leaves a cross-host runtime dependency (ringtail apps SPOF on - indri's pg over the tailnet). -- CNPG is the same operator on both clusters; a Cluster CR on ringtail - is mechanically equivalent to the one on minikube. -- Naming the ringtail cluster `blumeops-pg` in `databases` lets apps - use the same in-cluster DNS they would on indri. - -## Cold-cutover procedure (per service) - -Do these one service at a time. paperless first (heaviest, highest -data-sensitivity), then teslamate, then mealie. - -### 0. Prerequisites (once, before any service) - -- Confirm ringtail's `cnpg-system` operator and `databases` namespace - are healthy (immich-pg already runs there). -- Confirm ringtail pods can reach indri's `pg.ops.eblu.me:5432` (used - only to pull the dump) and the sifaka NFS export for paperless - media. See [[sifaka-nfs-from-ringtail]]. -- Define the ringtail `blumeops-pg` CNPG Cluster manifest (model on - `databases-ringtail/immich-pg.yaml`) and its ExternalSecrets for - the per-app roles. Sync it; let it come up empty and healthy. - -### 1. Quiesce the source - -```fish -kubectl --context=minikube-indri -n scale deploy/ --replicas=0 -# confirm 0 running, DB now has no writers -``` - -### 2. Dump from indri, restore to ringtail (postgres apps) - -```fish -# dump the single app DB from the quiesced source -kubectl --context=minikube-indri -n databases exec blumeops-pg-1 -- \ - pg_dump -Fc -d > /tmp/.dump - -# restore into the ringtail cluster -kubectl --context=k3s-ringtail -n databases exec -i blumeops-pg-1 -- \ - pg_restore --no-owner --role= -d < /tmp/.dump -``` - -For **mealie** (SQLite) instead: copy the `mealie-data` PVC contents -to the ringtail PVC (e.g. a one-shot rsync pod mounting both, or -`kubectl cp` via a helper pod). Verify the `.db` file size and that -mealie boots read-only against it. - -### 3. Verify the restore (before any routing flips) - -- Row counts match source for the key tables, scripted: - - paperless: `documents_document`, `documents_tag`, - `documents_correspondent`, `auth_user`. - - teslamate: `cars`, `drives`, `charging_processes`, `positions`. -- `pg_dump --schema-only --no-owner` diff between source and dest is - empty modulo CNPG-managed roles. -- Boot the app against the ringtail DB on its tailnet name *before* - Caddy is flipped, and smoke-test (paperless: documents list + - search; teslamate: dashboard loads recent drives; mealie: recipes - list). - -### 4. Release the service name - -```fish -# delete the minikube tailscale ingress so ringtail can claim the name -kubectl --context=minikube-indri -n delete ingress -tailscale -``` - -### 5. Bring up on ringtail - -- Apply the ringtail manifests (new ArgoCD app `-ringtail`, - `destination.server` = `https://ringtail.tail8d86e.ts.net:6443`). - App points at `blumeops-pg-rw.databases.svc.cluster.local`. -- Sync; wait for healthy + the ProxyGroup ingress to get its name. - -### 6. Flip routing - -- Repoint the Caddy `.ops.eblu.me` upstream at the ringtail - ProxyGroup ingress (provision-indri, caddy role). -- `mise run services-check` — confirm the service flips from FIRING - to OK and no neighbours regressed. - -### 7. Decommission the source (only after verification) - -- Remove the minikube manifests for the app. -- Drop the app DB from indri's `blumeops-pg` (paperless/teslamate) - **last**, once the ringtail side has served real traffic. - -## Rollback - -If a cutover fails verification at any step before §7: - -- Re-create the minikube tailscale ingress (if §4 ran). -- Scale the minikube app back to `1`. -- Repoint Caddy back to the minikube ingress. -- The source DB was never modified or dropped. Document the failure. diff --git a/docs/reference/infrastructure/indri.md b/docs/reference/infrastructure/indri.md index 8364ba0..cbb2a0f 100644 --- a/docs/reference/infrastructure/indri.md +++ b/docs/reference/infrastructure/indri.md @@ -1,7 +1,6 @@ --- title: Indri -modified: 2026-05-27 -last-reviewed: 2026-05-27 +modified: 2026-02-19 tags: - infrastructure - host @@ -16,7 +15,6 @@ Primary BlumeOps server. Mac Mini M1 (2020). | Property | Value | |----------|-------| | **Model** | Mac mini M1, 2020 (Macmini9,1) | -| **CPU / RAM** | 8 cores / 16 GB | | **Storage** | 2TB internal SSD | | **macOS** | 15.7.3 (Sequoia) | | **Tailscale hostname** | `indri.tail8d86e.ts.net` | @@ -32,13 +30,9 @@ Primary BlumeOps server. Mac Mini M1 (2020). - [[borgmatic]] - Backup system - [[alloy|Alloy]] - Metrics/logs collector - [[caddy]] - Reverse proxy for `*.ops.eblu.me` -- [[devpi]] - PyPI mirror (LaunchAgent) -- [[hephaestus]] - heph task/context sync hub (LaunchAgent, self-updating) -- [[cv]] - Static CV site, served by Caddy -- [[docs]] - Quartz-built docs site, served by Caddy **Kubernetes (via minikube):** -- [[apps|Most k8s applications]]. A growing set of apps (Authentik, Frigate, ntfy, Immich, Homepage, Shower, Kingfisher, alloy-ringtail) now run on [[ringtail]]'s k3s instead. Long-term plan is to decommission indri's minikube entirely. +- [[apps|Most k8s applications]] (Frigate, ntfy migrated to [[ringtail]] k3s) **GUI Applications (manual start required):** - Docker Desktop - Container runtime for minikube diff --git a/docs/reference/infrastructure/ringtail.md b/docs/reference/infrastructure/ringtail.md index a4e6837..8b93d4d 100644 --- a/docs/reference/infrastructure/ringtail.md +++ b/docs/reference/infrastructure/ringtail.md @@ -25,19 +25,6 @@ Service host and gaming PC. Custom-built PC running NixOS. | **OS** | NixOS 25.11 (Sway/Wayland) | | **Tailscale hostname** | `ringtail.tail8d86e.ts.net` | -## Networking - -| Property | Value | -|----------|-------| -| **Interface (wired)** | `enp5s0` | -| **IP** | `192.168.1.21/24` (static, set by NixOS scripted networking) | -| **Gateway** | `192.168.1.1` (UX7) | -| **DNS** | `192.168.1.1`, `1.1.1.1` (used as Tailscale's upstream resolvers; `/etc/resolv.conf` is owned by Tailscale's MagicDNS at `100.100.100.100`) | -| **DHCP reservation** | UniFi "Fixed IP" tied to ringtail's MAC; belt-and-suspenders so the UX7 won't lease `192.168.1.21` to anyone else even though ringtail no longer asks for it | -| **Wireless** | `wlp6s0` still managed by NetworkManager as a fallback path | - -NetworkManager is enabled but explicitly excluded from managing `enp5s0` via `networking.networkmanager.unmanaged = [ "interface-name:enp5s0" ]`. The wired address is configured by a deterministic `network-addresses-enp5s0.service` oneshot — no daemon, no lease, no renewal. - ## Software Managed declaratively via `nixos/ringtail/configuration.nix`. Home-manager handles ringtail-specific sway/waybar config; chezmoi manages cross-platform dotfiles. diff --git a/docs/reference/kubernetes/cluster.md b/docs/reference/kubernetes/cluster.md index 07c14af..9b632bd 100644 --- a/docs/reference/kubernetes/cluster.md +++ b/docs/reference/kubernetes/cluster.md @@ -1,7 +1,6 @@ --- title: Cluster -modified: 2026-06-04 -last-reviewed: 2026-06-04 +modified: 2026-02-19 tags: - kubernetes --- @@ -16,7 +15,7 @@ BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most ser |----------|-------| | **Driver** | docker | | **Container Runtime** | docker | -| **Kubernetes Version** | v1.35.0 | +| **Kubernetes Version** | v1.34.0 | | **CPUs** | 6 | | **Memory** | 11GB | | **Disk** | 200GB | @@ -42,9 +41,7 @@ Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringt |----------|-------| | **Context** | `k3s-ringtail` | | **API Server** | `https://ringtail.tail8d86e.ts.net:6443` | -| **Workloads** | GPU workloads (Frigate, Ollama), notifications (ntfy, frigate-notify), [[authentik]], and services migrated off indri minikube (Immich, Mealie, Paperless, TeslaMate). See [[ringtail]] for the authoritative list. | - -Services are being progressively migrated from indri's minikube to ringtail's k3s; the split above reflects an in-progress state, not a fixed boundary. +| **Workloads** | Frigate (GPU), ntfy, frigate-notify, nvidia-device-plugin | ## Related diff --git a/docs/reference/operations/security.md b/docs/reference/operations/security.md index 11c4df9..18561a5 100644 --- a/docs/reference/operations/security.md +++ b/docs/reference/operations/security.md @@ -46,7 +46,13 @@ Security posture and compliance scanning for BlumeOps infrastructure. All compliance scan reports are stored on `sifaka:/volume1/reports/`. See [[read-compliance-reports]] for access and interpretation. -Suppressed findings are kept in Prowler mutelist YAML under `argocd/manifests/prowler/mutelist/`. Each entry's `Description` field explains why the finding is muted; entries are reviewed ad-hoc rather than on a scheduled cadence. +## Compensating controls + +Suppressed findings reference named compensating controls tracked in `compensating-controls.yaml` (repo root). Each control has a review date and verification steps. See [[review-compensating-controls]] for the review process. + +```bash +mise run review-compensating-controls +``` ## Known gaps diff --git a/docs/reference/services/1password.md b/docs/reference/services/1password.md index 5ad50da..4489194 100644 --- a/docs/reference/services/1password.md +++ b/docs/reference/services/1password.md @@ -1,7 +1,6 @@ --- title: 1Password -modified: 2026-05-22 -last-reviewed: 2026-05-22 +modified: 2026-02-10 tags: - service - secrets @@ -9,22 +8,15 @@ tags: # 1Password -Root credential store for all BlumeOps secrets. Kubernetes workloads read items via [[external-secrets|External Secrets Operator]]; humans and agents read via the `op` CLI. +Root credential store for all BlumeOps secrets, synced to Kubernetes via External Secrets Operator. -## Vaults - -| Vault | Purpose | -|-------|---------| -| `blumeops` | Infrastructure secrets — referenced by ExternalSecret manifests and scripts. | -| `Personal` | Human login credentials keyed by URL for autofill. Not consumed by infrastructure. | - -## Kubernetes Integration +## Architecture ``` 1Password Cloud | v -1Password Connect (namespace: 1password, deployed on both indri and ringtail) +1Password Connect (namespace: 1password) | v External Secrets Operator (namespace: external-secrets) @@ -33,15 +25,15 @@ External Secrets Operator (namespace: external-secrets) Native Kubernetes Secrets ``` -**ClusterSecretStore:** `onepassword-blumeops` (same name on both clusters). +## Vault -Services reference 1Password items via `ExternalSecret` manifests. Both `minikube-indri` and `k3s-ringtail` run their own `onepassword-connect` deployment talking to the same vault. +The `blumeops` vault contains all infrastructure credentials. -## Direct Access +## Kubernetes Integration -Prefer `op read "op://vault/item/field"` over `op item get --fields` in scripts and IaC — `op item get --fields` wraps multi-line values in quotes, corrupting them. `op item get` without flags is fine for exploring item metadata. +**ClusterSecretStore:** `onepassword-blumeops` -If an item name contains special characters (e.g. parentheses), use the item ID instead of the name in the `op://` path. +Services reference 1Password items via `ExternalSecret` manifests. ## Disaster Recovery Backup @@ -49,9 +41,8 @@ The `mise run op-backup` task encrypts a `.1pux` vault export and transfers it t ## Related -- [[external-secrets]] — Kubernetes operator that consumes ClusterSecretStore -- [[argocd]] — Uses secrets for git access -- [[postgresql]] — Database credentials -- [[run-1password-backup]] — Periodic backup procedure -- [[restore-1password-backup]] — Recovery from backup -- [[borgmatic]] — Backup system +- [[argocd]] - Uses secrets for git access +- [[postgresql]] - Database credentials +- [[run-1password-backup]] - Periodic backup procedure +- [[restore-1password-backup]] - Recovery from backup +- [[borgmatic]] - Backup system diff --git a/docs/reference/services/alloy.md b/docs/reference/services/alloy.md index 97d1e77..d781f2f 100644 --- a/docs/reference/services/alloy.md +++ b/docs/reference/services/alloy.md @@ -1,7 +1,6 @@ --- title: Alloy -modified: 2026-06-04 -last-reviewed: 2026-06-04 +modified: 2026-03-13 tags: - service - observability @@ -21,10 +20,10 @@ Unified observability collector for metrics and logs with three deployments: | **Indri Binary** | `~/.local/bin/alloy` | | **Indri Config** | `~/.config/grafana-alloy/config.alloy` | | **K8s Namespace** | `alloy` | -| **K8s Image** | `registry.ops.eblu.me/blumeops/alloy:v1.16.0-9564435` (locally built) | +| **K8s Image** | `grafana/alloy:v1.14.0` | | **ArgoCD App** | `alloy-k8s` | | **Fly.io Config** | `fly/alloy.river` | -| **Fly.io Image** | `grafana/alloy:v1.16.1` (binary copied into nginx container, sha-pinned) | +| **Fly.io Image** | `grafana/alloy:v1.5.1` (binary copied into nginx container) | ## Metrics Collected diff --git a/docs/reference/services/borgmatic.md b/docs/reference/services/borgmatic.md index 37f1a60..fea4551 100644 --- a/docs/reference/services/borgmatic.md +++ b/docs/reference/services/borgmatic.md @@ -25,7 +25,7 @@ Daily backup system using Borg backup, running on indri. ## What Gets Backed Up **Directories:** -- `~/code/personal/zk` - Zettelkasten (migrating into heph docs; see [hephaestus](https://github.com/eblume/hephaestus)) +- `~/code/personal/zk` - Zettelkasten - `/opt/homebrew/var/forgejo` - Git forge data - `~/.config/borgmatic` - Borgmatic config - `~/Documents` - Personal documents diff --git a/docs/reference/services/hephaestus.md b/docs/reference/services/hephaestus.md deleted file mode 100644 index 7abc35b..0000000 --- a/docs/reference/services/hephaestus.md +++ /dev/null @@ -1,141 +0,0 @@ ---- -title: Hephaestus -modified: 2026-06-04 -last-reviewed: 2026-06-04 -tags: - - service - - hephaestus ---- - -# Hephaestus - -[hephaestus](https://github.com/eblume/hephaestus) (`heph`) is the user's -self-hosted task + context/knowledge system. It is **hub-and-spoke**: each device -runs a full local SQLite replica (`hephd --mode local`) and background-syncs -against one canonical **hub**. Indri runs that hub. - -## Quick Reference - -| Property | Value | -|----------|-------| -| **PWA URL** | https://heph.ops.eblu.me (browser PWA, Caddy TLS) | -| **Spoke sync URL** | http://indri.tail8d86e.ts.net:8787 (direct, tailnet) | -| **Local Port** | 8787 (`hephd --mode server`, bound `0.0.0.0`) | -| **Binary** | `~/.cargo/bin/hephd` (self-updating) | -| **Data** | `~/.local/share/heph/heph.db` | -| **PWA shell** | `~/.local/share/heph/web` | -| **Logs** | `~/Library/Logs/mcquack.heph.{out,err}.log` | -| **LaunchAgent** | `mcquack.eblume.heph` | -| **Ansible role** | `ansible/roles/heph` (tag `heph`) | - -## What runs on indri - -The launchagent runs the hub in server mode with three features enabled: - -``` -hephd --mode server --http-addr 0.0.0.0:8787 --db ~/.local/share/heph/heph.db - --web-root ~/.local/share/heph/web - --oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ - --oidc-audience heph - --self-update --self-update-interval-secs 600 -``` - -- **Server mode** exposes the HTTP sync endpoint (`/rpc`, `/sync/*`) that spokes - reconcile their op-log against. -- **Self-update** (10-minute poll) rebuilds `hephd` from the forge when a newer - release tag appears (`cargo install --git https://forge.eblu.me/eblume/hephaestus.git`). - Indri's Rust toolchain (`~/.cargo/bin`) is on the agent's `PATH` for this, and - the plist pins `RUSTUP_TOOLCHAIN=stable` — the - launchagent runs without mise, so a bare `cargo` shim would otherwise fall back - to rustup's *default* toolchain, which can lag behind heph's `rust-version` floor - (1.89) and silently fail the build. -- **PWA** (`--web-root`) serves the [heph-pwa] mobile shell; Caddy terminates TLS - at `heph.ops.eblu.me` so the PWA runs in a secure context (service worker, - install-to-home-screen, voice capture). - -[heph-pwa]: https://github.com/eblume/hephaestus - -The hub binds `0.0.0.0` so tailnet spokes can also sync directly -(`http://indri.tail8d86e.ts.net:8787`); access is gated by Authentik OIDC either -way — tailnet reachability alone is not enough. - -## Authentication (Authentik OIDC, device-code) - -The hub verifies an OIDC bearer token on every sync. The `heph` application is a -**public** OAuth2 client using the **device-code flow** (RFC 8628), provisioned -in the [[authentik]] blueprint (`argocd/manifests/authentik/configmap-blueprint.yaml`): - -- Issuer: `https://authentik.ops.eblu.me/application/o/heph/` -- Audience / client id: `heph` -- Restricted to the `admins` group (single-owner, sensitive data). -- Scope mappings: `openid`, `email`, `profile`, **`offline_access`**. - -> **`offline_access` is required for durable sync.** The `heph` CLI requests -> `scope = "openid offline_access"`, and a refresh token is only issued for the -> 30-day refresh-token window when the provider actually grants `offline_access`. -> Without that scope mapping the refresh token is bound to the login **session**; -> once the session lapses, hephd's `refresh_token` grant returns `400 Bad -> Request`, the bearer can't be refreshed, and spoke sync silently degrades -> (`heph sync --status` → `auth_failure: true`). `heph auth login` papers over it -> until the next session expiry. Keep `offline_access` in the provider's -> `property_mappings`. - -Because no Authentik instance ships a device-code flow by default, the blueprint -also creates `default-device-code-flow` and binds it to the default brand's -`flow_device_code`. Devices obtain a token with `heph auth login`; the PWA -currently takes a pasted token (in-app device-code login is upstream follow-up). - -## Data seeding (Path A, one-time) - -The hub was seeded from the existing `gilbert` device so no task history was -lost. heph's data-safe bring-up ("Path A") has the hub **adopt the device's -identity** rather than rewriting the device: - -1. Quiesce the seed device: `heph daemon stop` (on gilbert). -2. Copy its store to indri: `scp ~/.local/share/heph/heph.db indri:~/.local/share/heph/heph.db`. -3. Give the hub its **own device origin** (keeps gilbert's `owner_id` + data; - `hephd` regenerates a fresh `origin` on next start when it is missing): - ```fish - ssh indri "sqlite3 ~/.local/share/heph/heph.db \"DELETE FROM meta WHERE key='origin';\"" - ``` -4. `mise run provision-indri -- --tags heph` (installs hephd, stages the PWA, - loads the launchagent → hub starts on the seeded store). - -Only `meta.origin` changes; `owner_id`, nodes, op-log, and links are copied -untouched. A clean `hephd --owner-id` / seed command is tracked upstream as -hephaestus follow-up — until then this manual reset is the documented path. - -## Connecting a spoke (e.g. gilbert) - -A device joins by running its local daemon with the hub URL + OIDC client and -logging in once: - -```bash -hephd --mode local --hub-url http://indri.tail8d86e.ts.net:8787 \ - --oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \ - --oidc-client-id heph -heph auth login --hub-url http://indri.tail8d86e.ts.net:8787 \ - --issuer https://authentik.ops.eblu.me/application/o/heph/ --client-id heph -``` - -> **Use the direct `http://…:8787` tailnet URL for sync, not the Caddy HTTPS -> URL.** hephd's sync client is plain-HTTP-only; pointing `--hub-url` at -> `https://heph.ops.eblu.me` fails with a confusing `error sending request` -> (the HTTP connector rejects the `https` scheme before connecting). Tailscale -> encrypts the transport, and the OIDC bearer token still gates every request. -> `heph.ops.eblu.me` (Caddy TLS) exists only for the browser PWA, which needs a -> secure context. The cached token is keyed by the exact `--hub-url`, so use the -> same value for `hephd` and `heph auth login`. - -> **Caveat:** `heph daemon` cannot yet bake hub/spoke flags into the generated -> launchd plist (upstream gap). On a spoke whose plist is managed by `heph -> daemon`, the hub/OIDC flags must be hand-added — and a later `heph daemon -> start/restart` will regenerate the plist and drop them. Avoid `heph daemon` -> subcommands on a configured spoke until that gap is closed; reload via -> `launchctl` instead. - -## Related - -- [[indri]] — host -- [[authentik]] — OIDC provider -- [[caddy]] — TLS termination for `heph.ops.eblu.me` diff --git a/docs/reference/services/ntfy.md b/docs/reference/services/ntfy.md index 1bf45af..b549a6d 100644 --- a/docs/reference/services/ntfy.md +++ b/docs/reference/services/ntfy.md @@ -1,7 +1,6 @@ --- title: Ntfy -modified: 2026-06-04 -last-reviewed: 2026-06-04 +modified: 2026-02-17 tags: - service - notifications @@ -18,7 +17,7 @@ Self-hosted push notification service. Ntfy receives HTTP POST messages and deli | **URL** | https://ntfy.ops.eblu.me | | **Tailscale URL** | https://ntfy.tail8d86e.ts.net | | **Namespace** | `ntfy` | -| **Image** | `registry.ops.eblu.me/blumeops/ntfy:v2.19.2-fd0bebb-nix` (locally built) | +| **Image** | `binwiederhier/ntfy:v2.17.0` | | **Upstream** | https://github.com/binwiederhier/ntfy | | **Manifests** | `argocd/manifests/ntfy/` | diff --git a/docs/reference/services/tempo.md b/docs/reference/services/tempo.md index 5eb5d87..771b97f 100644 --- a/docs/reference/services/tempo.md +++ b/docs/reference/services/tempo.md @@ -1,7 +1,6 @@ --- title: Tempo -modified: 2026-06-04 -last-reviewed: 2026-06-04 +modified: 2026-03-05 tags: - service - observability @@ -19,7 +18,7 @@ Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTL | **Tailscale URL** | https://tempo.tail8d86e.ts.net | | **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net | | **Namespace** | `monitoring` | -| **Image** | `registry.ops.eblu.me/blumeops/tempo:v2.10.3-75f9ba4` (locally built) | +| **Image** | `grafana/tempo:2.10.1` | | **Storage** | 10Gi PVC (local filesystem) | | **Retention** | 7 days | diff --git a/docs/reference/services/zot.md b/docs/reference/services/zot.md index b01a6ce..d00a200 100644 --- a/docs/reference/services/zot.md +++ b/docs/reference/services/zot.md @@ -56,9 +56,8 @@ The `zot-ci` API key expires every **90 days**. To rotate: 5. Generate a new API key, copy it to clipboard 6. Update 1Password: ```fish - set -l NEWKEY (pbpaste); op item edit "Forgejo Secrets" --vault blumeops "zot-ci-api[password]=$NEWKEY"; set -e NEWKEY + pbpaste | op item edit "Forgejo Secrets" --vault blumeops "zot-ci-api[password]=-" ``` - The value is briefly visible to other `ps`-readers on this machine (single-user mac, acceptable tradeoff). The older `pbpaste | op item edit ... "field[password]=-"` stdin syntax was rejected by op 2.34 as "invalid JSON" — recent op versions treat piped input as a full JSON template. 7. Sync to Forgejo: `mise run provision-indri -- --tags forgejo_actions_secrets` ## Related diff --git a/docs/reference/storage/backups.md b/docs/reference/storage/backups.md index 2dfbae4..14dbcea 100644 --- a/docs/reference/storage/backups.md +++ b/docs/reference/storage/backups.md @@ -22,7 +22,7 @@ Daily automated backups from [[indri]] to [[sifaka|Sifaka]] NAS. | Path | Description | Priority | |------|-------------|----------| -| `~/code/personal/zk` | Zettelkasten notes (migrating into heph docs) | Critical | +| `~/code/personal/zk` | Zettelkasten notes | Critical | | `/opt/homebrew/var/forgejo` | Git repositories | Critical | | `~/.config/borgmatic` | Backup config | High | | `~/Documents` | Personal documents (includes [[1password]] encrypted export) | High | diff --git a/docs/reference/tools/mise-tasks.md b/docs/reference/tools/mise-tasks.md index b614cb1..4ec3438 100644 --- a/docs/reference/tools/mise-tasks.md +++ b/docs/reference/tools/mise-tasks.md @@ -69,6 +69,7 @@ Run `mise tasks --sort name` for the live list with descriptions. |------|-------------| | `services-check` | Check all services are online and responding | | `service-review` | Review the most stale service for version freshness | +| `blumeops-tasks` | List tasks from Todoist sorted by priority | | `op-backup` | Encrypt 1Password export and send to indri for borgmatic | ## Infrastructure Setup diff --git a/docs/tutorials/ai-assistance-guide.md b/docs/tutorials/ai-assistance-guide.md index 4f0c595..3ee1ffa 100644 --- a/docs/tutorials/ai-assistance-guide.md +++ b/docs/tutorials/ai-assistance-guide.md @@ -98,6 +98,7 @@ BlumeOps operations are driven by mise tasks. Run `mise tasks` to list all avail | `provision-indri` | Deploy changes to [[indri]]-hosted services via Ansible | | `services-check` | After deployments - verify all services are healthy | | `pr-comments` | Check unresolved PR comments during review | +| `blumeops-tasks` | Find pending tasks from Todoist | | `container-list` | View available container images and tags | | `container-build-and-release` | Trigger container build workflows | | `dns-preview` | Preview DNS changes before applying | @@ -110,8 +111,6 @@ BlumeOps operations are driven by mise tasks. Run `mise tasks` to list all avail | `docs-review` | Review the most stale doc by last-reviewed date | | `runner-logs` | View Forgejo workflow logs (indri or ringtail runner) | -For task discovery, BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`), not Todoist. List outstanding work with `heph list --project Blumeops --json`. - For ArgoCD operations, use the `argocd` CLI directly: - `argocd app diff ` - Preview changes - `argocd app sync ` - Deploy changes diff --git a/docs/tutorials/contributing.md b/docs/tutorials/contributing.md index 0d48e8f..a2a7069 100644 --- a/docs/tutorials/contributing.md +++ b/docs/tutorials/contributing.md @@ -11,7 +11,7 @@ tags: > **Audiences:** Contributor -This tutorial walks through making your first contribution to BlumeOps - from understanding the codebase to submitting a pull request. +This tutorial walks through making your first contribution to BluemeOps - from understanding the codebase to submitting a pull request. ## Prerequisites diff --git a/docs/tutorials/expose-service-publicly.md b/docs/tutorials/expose-service-publicly.md index 65af611..886cad4 100644 --- a/docs/tutorials/expose-service-publicly.md +++ b/docs/tutorials/expose-service-publicly.md @@ -376,13 +376,6 @@ Mitigations for dynamic services: - fail2ban on indri (see below) can block IPs showing abuse patterns - The break-glass shutoff remains the last resort -The most acute version of this in practice has been **AI scrapers**, which -ignore `robots.txt` and crawl dynamic services (notably [[forgejo|Forgejo]]'s -infinite git-history URL space) into both a surprise egress bill and an -effective L7 DoS. See [[ai-scraper-mitigation]] for the incident, the tiered -defense (mirror black-hole, user-agent denylist, Anubis proof-of-work), and -why a Cloudflare Tunnel is *not* the chosen answer here. - If a publicly exposed dynamic service attracts targeted attacks or the home network bandwidth is impacted, consider migrating to Cloudflare Tunnel for enterprise-grade DDoS protection (requires DNS migration; diff --git a/docs/tutorials/replicating-blumeops.md b/docs/tutorials/replicating-blumeops.md index e54ecb2..f2ed8ca 100644 --- a/docs/tutorials/replicating-blumeops.md +++ b/docs/tutorials/replicating-blumeops.md @@ -1,7 +1,6 @@ --- title: Replicating BlumeOps -modified: 2026-05-11 -last-reviewed: 2026-05-11 +modified: 2026-02-07 tags: - tutorials - replication @@ -11,7 +10,7 @@ tags: > **Audiences:** Replicator -This tutorial provides a roadmap for building your own homelab GitOps environment inspired by BlumeOps. It links to detailed component tutorials for each major piece. +This tutorial provides a roadmap for building your own homelab GitOps environment inspired by BluemeOps. It links to detailed component tutorials for each major piece. ## What You'll Build @@ -24,7 +23,7 @@ By following this guide, you'll have: ## Hardware Requirements -BlumeOps runs on modest hardware. At minimum: +BluemeOps runs on modest hardware. At minimum: | Component | BlumeOps Uses | Minimum Alternative | |-----------|---------------|---------------------| @@ -95,7 +94,7 @@ Without observability, you're flying blind. ### Phase 6: Your First Services -With the foundation in place, deploy actual workloads. BlumeOps runs: +With the foundation in place, deploy actual workloads. BluemeOps runs: - [[miniflux]] - RSS reader - [[jellyfin]] - Media server - [[immich]] - Photo management @@ -119,7 +118,7 @@ Protect your data. ## Alternative Approaches -BlumeOps makes specific choices that may not suit everyone: +BluemeOps makes specific choices that may not suit everyone: | BlumeOps Choice | Alternative | |-----------------|-------------| diff --git a/fly/Dockerfile b/fly/Dockerfile index 406c849..eae8c35 100644 --- a/fly/Dockerfile +++ b/fly/Dockerfile @@ -1,5 +1,5 @@ -# nginx 1.30.1-alpine -FROM nginx@sha256:c819f83c54b0361f5557601bf5eb4943d09360e7a7fdf426afc466570f45874d +# nginx 1.30.0-alpine +FROM nginx@sha256:0272e4604ed93c1792f03695a033a6e8546840f86e0de20a884bb17d2c924883 # Copy tailscale binaries from official image (v1.94.2) COPY --from=docker.io/tailscale/tailscale@sha256:95e528798bebe75f39b10e74e7051cf51188ee615934f232ba7ad06a3390ffa1 \ @@ -13,8 +13,8 @@ RUN mkdir -p /var/run/tailscale /var/lib/tailscale \ && apk add --no-cache fail2ban \ && rm -f /etc/fail2ban/jail.d/alpine-ssh.conf -# Copy Alloy binary from official image (v1.16.1, Ubuntu-based, needs libc6-compat) -COPY --from=docker.io/grafana/alloy@sha256:51aeb9d829239345070619dad3edd6873186f913c84f45b365b74574fcb38ec0 \ +# Copy Alloy binary from official image (v1.16.0, Ubuntu-based, needs libc6-compat) +COPY --from=docker.io/grafana/alloy@sha256:6e00cf7c5a692ff5f24844529416ed017d76fce922f8199004e73d5eca46b6b8 \ /bin/alloy /usr/local/bin/alloy RUN mkdir -p /var/log/nginx /etc/alloy /tmp/alloy-data @@ -25,7 +25,6 @@ COPY fail2ban/action.d/nginx-deny.conf /etc/fail2ban/action.d/nginx-deny.conf COPY nginx.conf /etc/nginx/nginx.conf COPY error.html /usr/share/nginx/html/error.html -COPY naughty.html /usr/share/nginx/html/naughty.html COPY alloy.river /etc/alloy/config.alloy COPY start.sh /start.sh RUN chmod +x /start.sh diff --git a/fly/fly.toml b/fly/fly.toml index 6ccf29d..11aac9c 100644 --- a/fly/fly.toml +++ b/fly/fly.toml @@ -7,7 +7,7 @@ primary_region = "sjc" memory = "512mb" [deploy] -strategy = "immediate" +strategy = "bluegreen" [http_service] internal_port = 8080 diff --git a/fly/naughty.html b/fly/naughty.html deleted file mode 100644 index b6eada8..0000000 --- a/fly/naughty.html +++ /dev/null @@ -1,61 +0,0 @@ - - - - - - - 403 · Roll of Dishonour - - - -
-

🪤 403 — you walked into the scraper trap

-

These are mirror repositories. They are tailnet-only.

- -

- This path used to serve the web UI for mirrors of public upstream - projects. It exists for supply-chain control, not for crawling. A - robots.txt politely disallowed /mirrors/. - A pack of AI scrapers ignored it, walked the infinite git-history URL - space, and ran up ~1.25 TB of egress and a real - money bill in a single month — while timing out the server for everyone - else. -

- -

So /mirrors/ is closed at the edge now. Roll of dishonour, - by share of the bytes they stole:

- - - - - - - - - -
OperatorUser-Agent
Metameta-externalagent
OpenAIGPTBot
AmazonAmazonbot
ByteDanceBytespider
- -

- If you are a human who actually wanted these mirrors, they are reachable - from the tailnet at forge.ops.eblu.me. If you are a crawler: - read the robots.txt next time. We left you a header, too. -

-
- - diff --git a/fly/nginx.conf b/fly/nginx.conf index ec35774..570e6c9 100644 --- a/fly/nginx.conf +++ b/fly/nginx.conf @@ -215,33 +215,6 @@ http { return 403 "API documentation is only available at forge.ops.eblu.me (tailnet).\n"; } - # Black-hole the mirror repositories on WAN. These are mirrors of - # already-public upstreams (tailscale, prometheus, mealie, …) kept - # for supply-chain control; CI, gilbert, and tailnet clients consume - # them via forge.ops.eblu.me. Their web UI served no public purpose - # but AI scrapers, which crawled the near-infinite git-history URL - # space (src/commit, commits, blame, raw) and drove ~70% of Fly - # egress (1.24 TB/30d → a surprise bill) plus enough upstream load to - # time out Forgejo. robots.txt already Disallows /mirrors/, but - # meta-externalagent and GPTBot ignore it — so enforce at the edge. - # `^~` makes this win over the regex locations below (e.g. *.css), so - # static assets under /mirrors/ can't leak through. We also name and - # shame: blocked requests get a "roll of dishonour" page (403 status - # preserved) and an X-Naughty-Scrapers header. See - # docs/explanation/ai-scraper-mitigation.md. - location ^~ /mirrors/ { - error_page 403 /naughty.html; - return 403; - } - - # Roll of dishonour — served on the /mirrors/ 403, status kept at 403. - location = /naughty.html { - internal; - root /usr/share/nginx/html; - add_header X-Naughty-Scrapers "OpenAI/GPTBot, Meta/meta-externalagent, Amazonbot, ByteDance/Bytespider — robots.txt ignorers" always; - add_header X-Clacks-Overhead "GNU Terry Pratchett" always; - } - # Redirect archive endpoints to tailnet — archive requests generate full # git bundles on demand. Unauthenticated crawlers hitting unique commit # SHAs cause unbounded CPU and disk usage (DoS vector). Legitimate users diff --git a/mise-tasks/blumeops-tasks b/mise-tasks/blumeops-tasks new file mode 100755 index 0000000..035aa3b --- /dev/null +++ b/mise-tasks/blumeops-tasks @@ -0,0 +1,216 @@ +#!/usr/bin/env -S uv run --script +# /// script +# requires-python = ">=3.12" +# dependencies = ["httpx==0.28.1", "rich==15.0.0"] +# /// +#MISE description="List Blumeops tasks from Todoist sorted by priority" +"""Fetch and display Blumeops tasks from Todoist, sorted by priority. + +This script is specific to Erich Blume's personal development workflow and +is not intended for general use. It requires: + + - A 1Password CLI (`op`) configured with access to the author's vault + - A Todoist account with a project named "Blumeops" + +The script fetches tasks and displays them sorted by a custom priority order: +p1 (urgent), p2 (high), p4 (normal/default), p3 (backlog). The p3-last ordering +reflects a deliberate choice to treat p3 as "backlog" rather than moderate +priority. + +Usage: mise run blumeops-tasks +""" + +import subprocess +import sys +from datetime import date + +import httpx +from rich.console import Console +from rich.markup import escape +from rich.text import Text + +TODOIST_API_BASE = "https://api.todoist.com/api/v1" +PROJECT_NAME = "Blumeops" + +# Priority mapping: Todoist API uses 1=normal(p4), 2=moderate(p3), 3=high(p2), 4=urgent(p1) +# User wants order: p1, p2, p4, p3 (p3 is backlog, goes last) +PRIORITY_LABELS = {4: "p1", 3: "p2", 1: "p4", 2: "p3"} +PRIORITY_SORT_ORDER = {4: 1, 3: 2, 1: 3, 2: 4} # Lower = earlier + + +def get_todoist_token() -> str: + """Retrieve Todoist API token from 1Password.""" + result = subprocess.run( + ["op", "read", "op://vg6xf6vvfmoh5hqjjhlhbeoaie/c53h3xnmswhvexa5mntoyvhgpm/credential"], + capture_output=True, + text=True, + ) + if result.returncode != 0: + raise RuntimeError(f"Failed to get Todoist token from 1Password: {result.stderr}") + return result.stdout.strip() + + +def get_project_id(client: httpx.Client, project_name: str) -> str: + """Find project ID by name.""" + cursor = None + while True: + params = {} + if cursor: + params["cursor"] = cursor + response = client.get(f"{TODOIST_API_BASE}/projects", params=params) + response.raise_for_status() + data = response.json() + for project in data.get("results", data if isinstance(data, list) else []): + if project["name"] == project_name: + return project["id"] + cursor = data.get("next_cursor") if isinstance(data, dict) else None + if not cursor: + break + + raise RuntimeError(f"Project '{project_name}' not found in Todoist") + + +def get_tasks(client: httpx.Client, project_id: str) -> list[dict]: + """Get all tasks for a project.""" + tasks = [] + cursor = None + while True: + params = {"project_id": project_id} + if cursor: + params["cursor"] = cursor + response = client.get(f"{TODOIST_API_BASE}/tasks", params=params) + response.raise_for_status() + data = response.json() + tasks.extend(data.get("results", data if isinstance(data, list) else [])) + cursor = data.get("next_cursor") if isinstance(data, dict) else None + if not cursor: + break + return tasks + + +def is_due(task: dict) -> bool: + """Check if a task should be displayed based on its due date. + + Tasks without a due date are always shown. Tasks with a due date + are only shown when the date is today or in the past. + """ + due = task.get("due") + if due is None: + return True + due_date = date.fromisoformat(due["date"][:10]) + return due_date <= date.today() + + +def days_until_due(task: dict) -> int | None: + """Return signed days offset from today, or None if no due date. + + Negative = days remaining before due (e.g. -2 = due in 2 days). + Positive = days past due (overdue). Zero = due today. + """ + due = task.get("due") + if due is None: + return None + due_date = date.fromisoformat(due["date"][:10]) + return (date.today() - due_date).days + + +def recurrence_string(task: dict) -> str | None: + """Return the Todoist natural-language recurrence string, or None. + + Todoist's REST API doesn't expose RFC 5545 RRULE; the natural-language + `due.string` (e.g. "every monday", "every 2 weeks") is the terse form. + """ + due = task.get("due") + if due is None or not due.get("is_recurring"): + return None + return due.get("string") + + +def sort_tasks(tasks: list[dict]) -> list[dict]: + """Sort by overdue-ness, then priority. + + Most overdue first (largest +N); tasks with no due date come last. + Within a given day, tiebreaker is the custom priority order p1, p2, p4, p3. + """ + + def key(task: dict) -> tuple[int, int, int]: + days = days_until_due(task) + no_due = 1 if days is None else 0 + days_key = -(days if days is not None else 0) # descending + return (no_due, days_key, PRIORITY_SORT_ORDER.get(task["priority"], 5)) + + return sorted(tasks, key=key) + + +def main() -> int: + console = Console() + + # Get API token + try: + token = get_todoist_token() + except RuntimeError as e: + console.print(f"[red]Error:[/red] {e}") + return 1 + + # Create HTTP client with auth header + with httpx.Client(headers={"Authorization": f"Bearer {token}"}) as client: + # Find project + try: + project_id = get_project_id(client, PROJECT_NAME) + except RuntimeError as e: + console.print(f"[red]Error:[/red] {e}") + return 1 + + # Get, filter, and sort tasks + tasks = get_tasks(client, project_id) + tasks = [t for t in tasks if is_due(t)] + sorted_tasks = sort_tasks(tasks) + + if not sorted_tasks: + console.print("No tasks found in Blumeops project") + return 0 + + # Display tasks + console.print(f"[bold]Blumeops Tasks[/bold] ({len(sorted_tasks)} tasks)") + console.print("=" * 40) + console.print() + + for task in sorted_tasks: + priority = task["priority"] + label = PRIORITY_LABELS.get(priority, "p?") + content = task["content"] + description = task.get("description", "") + + # Header line with priority and content + header = Text() + header.append(f"[{label}]", style="bold") + header.append(f" {content}") + + meta = [] + days = days_until_due(task) + if days is not None: + if days == 0: + meta.append("due today") + elif days > 0: + meta.append(f"{days}d overdue") + else: + meta.append(f"due in {-days}d") + recurrence = recurrence_string(task) + if recurrence: + meta.append(f"↻ {recurrence}") + if meta: + header.append(f" ({', '.join(meta)})", style="dim") + console.print(header) + + # Description indented (escape rich markup to preserve brackets) + if description: + for line in description.split("\n"): + console.print(f" {escape(line)}", style="dim") + + console.print() + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/mise-tasks/branch-cleanup b/mise-tasks/branch-cleanup index a538880..575c9a1 100755 --- a/mise-tasks/branch-cleanup +++ b/mise-tasks/branch-cleanup @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Delete branches that have been merged into main (local and remote)" #MISE alias="bc" diff --git a/mise-tasks/container-build-and-release b/mise-tasks/container-build-and-release index 85e6cb8..ba569e7 100755 --- a/mise-tasks/container-build-and-release +++ b/mise-tasks/container-build-and-release @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["typer==0.26.2", "httpx==0.28.1"] +# dependencies = ["typer==0.25.0", "httpx==0.28.1"] # /// #MISE description="Trigger container build workflows via Forgejo API" #USAGE arg "" help="Container name (directory under containers/)" diff --git a/mise-tasks/container-list b/mise-tasks/container-list index 7dad346..26639f2 100755 --- a/mise-tasks/container-list +++ b/mise-tasks/container-list @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="List available containers and their recent tags" #USAGE arg "[name]" help="Optional container name to filter output" diff --git a/mise-tasks/container-version-check b/mise-tasks/container-version-check index 06f96ae..4ebe3b6 100755 --- a/mise-tasks/container-version-check +++ b/mise-tasks/container-version-check @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Validate container version consistency across container.py, Dockerfiles, nix derivations, and service-versions.yaml" #USAGE flag "--all-files" help="Check all containers, not just changed ones" diff --git a/mise-tasks/dns-acme-cleanup b/mise-tasks/dns-acme-cleanup index 3a53b11..432a6ce 100755 --- a/mise-tasks/dns-acme-cleanup +++ b/mise-tasks/dns-acme-cleanup @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Delete orphaned ACME challenge TXT records in eblu.me" #USAGE flag "--dry-run" help="List orphans without deleting" diff --git a/mise-tasks/docs-mikado b/mise-tasks/docs-mikado index c632e46..eea052f 100755 --- a/mise-tasks/docs-mikado +++ b/mise-tasks/docs-mikado @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "pyyaml==6.0.3", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="View active Mikado dependency chains for C2 changes" #USAGE arg "[card]" help="Card stem to show chain for" diff --git a/mise-tasks/docs-preview b/mise-tasks/docs-preview index 9e0bd16..faa79af 100755 --- a/mise-tasks/docs-preview +++ b/mise-tasks/docs-preview @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Build docs with Dagger and serve locally, opening to a specific card" #USAGE arg "" help="Card path relative to docs/, e.g. how-to/knowledgebase/review-documentation" diff --git a/mise-tasks/docs-review b/mise-tasks/docs-review index 12e301f..d07904d 100755 --- a/mise-tasks/docs-review +++ b/mise-tasks/docs-review @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Review the most stale documentation card by last-reviewed date" #USAGE flag "--limit " default="15" help="Number of docs to show in the table" diff --git a/mise-tasks/docs-review-stale b/mise-tasks/docs-review-stale index 0c5490e..4449213 100755 --- a/mise-tasks/docs-review-stale +++ b/mise-tasks/docs-review-stale @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["rich==15.0.0", "typer==0.26.2"] +# dependencies = ["rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Report docs by git-last-modified date, highlighting stale ones" #USAGE flag "--threshold " default="180" help="Days before a doc is considered stale" diff --git a/mise-tasks/mikado-branch-invariant-check b/mise-tasks/mikado-branch-invariant-check index 3135bf2..1f0fbcf 100755 --- a/mise-tasks/mikado-branch-invariant-check +++ b/mise-tasks/mikado-branch-invariant-check @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["rich==15.0.0", "typer==0.26.2"] +# dependencies = ["rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Validate Mikado Branch Invariant on mikado/* branches" #USAGE arg "[commit_msg_file]" help="Commit message file (passed by commit-msg hook)" diff --git a/mise-tasks/op-backup b/mise-tasks/op-backup index 7db033b..37a97a6 100755 --- a/mise-tasks/op-backup +++ b/mise-tasks/op-backup @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["rich==15.0.0", "typer==0.26.2"] +# dependencies = ["rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Encrypt a 1Password .1pux export and send to indri for borgmatic" #USAGE arg "[export_path]" help="Path to .1pux export file (prompted if omitted)" diff --git a/mise-tasks/pr-comments b/mise-tasks/pr-comments index 39d7c9a..7205617 100755 --- a/mise-tasks/pr-comments +++ b/mise-tasks/pr-comments @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="List unresolved comments on a PR" #USAGE arg "" help="Pull request number" diff --git a/mise-tasks/prune-ringtail-generations b/mise-tasks/prune-ringtail-generations index 2ad8dc8..2b8e3f9 100755 --- a/mise-tasks/prune-ringtail-generations +++ b/mise-tasks/prune-ringtail-generations @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["rich==15.0.0", "typer==0.26.2"] +# dependencies = ["rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Prune old NixOS generations on ringtail, preserving rollback safety" #MISE alias="prg" diff --git a/mise-tasks/review-compensating-controls b/mise-tasks/review-compensating-controls new file mode 100755 index 0000000..e92d302 --- /dev/null +++ b/mise-tasks/review-compensating-controls @@ -0,0 +1,229 @@ +#!/usr/bin/env -S uv run --script +# /// script +# requires-python = ">=3.12" +# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] +# /// +#MISE description="Review the most stale compensating control" +#USAGE flag "--limit " default="10" help="Number of controls to show in the table" +"""Review compensating controls by staleness. + +Reads ``compensating-controls.yaml`` and sorts by ``last-reviewed``. +Shows a staleness table, then displays the most stale control with all +references found in the codebase. + +After reviewing, update the control entry: + + last-reviewed: YYYY-MM-DD + +Usage: mise run review-compensating-controls [--limit 10] +""" + +import subprocess +import sys +from datetime import date +from pathlib import Path +from typing import Annotated + +import typer +import yaml +from rich.console import Console +from rich.panel import Panel +from rich.table import Table + +CONTROLS_FILE = Path(__file__).parent.parent / "compensating-controls.yaml" +REPO_ROOT = Path(__file__).parent.parent + + +def load_controls(path: Path) -> list[dict]: + data = yaml.safe_load(path.read_text()) + return data.get("controls", []) + + +def parse_date(raw) -> date | None: + if raw is None: + return None + if isinstance(raw, date): + return raw + try: + return date.fromisoformat(str(raw)) + except ValueError: + return None + + +def find_references(control_id: str) -> list[str]: + """Find all files referencing a control ID using ripgrep.""" + try: + result = subprocess.run( + ["rg", "--no-heading", "-n", control_id, str(REPO_ROOT)], + capture_output=True, + text=True, + timeout=10, + ) + lines = result.stdout.strip().splitlines() + # Exclude the controls file itself and this script + return [ + ln + for ln in lines + if "compensating-controls.yaml" not in ln + and "review-compensating-controls" not in ln + ] + except (FileNotFoundError, subprocess.TimeoutExpired): + return [] + + +def main( + limit: Annotated[ + int, typer.Option(help="Number of controls to show in the table") + ] = 10, +) -> None: + console = Console() + today = date.today() + + if not CONTROLS_FILE.exists(): + console.print( + f"[bold red]Controls file not found:[/bold red] {CONTROLS_FILE}" + ) + raise typer.Exit(code=1) + + controls = load_controls(CONTROLS_FILE) + + # Parse dates and build sortable entries + entries: list[tuple[dict, date | None]] = [] + for ctrl in controls: + reviewed = parse_date(ctrl.get("last-reviewed")) + entries.append((ctrl, reviewed)) + + # Sort: never-reviewed first, then oldest + entries.sort(key=lambda e: (e[1] is not None, e[1] or date.min)) + + never_reviewed = sum(1 for _, r in entries if r is None) + + # --- Summary panel --- + console.print() + console.print( + Panel( + f"[bold]{len(entries)}[/bold] compensating controls, " + f"[bold red]{never_reviewed}[/bold red] never reviewed", + title="[bold]Compensating Control Review Queue[/bold]", + border_style="cyan", + ) + ) + console.print() + + # --- Staleness table --- + table = Table(show_header=True, header_style="bold") + table.add_column("#", justify="right") + table.add_column("Control ID") + table.add_column("Last Reviewed", justify="right") + table.add_column("Age (days)", justify="right") + table.add_column("Refs", justify="right") + + for i, (ctrl, reviewed) in enumerate(entries[:limit], 1): + control_id = ctrl["id"] + refs = len(find_references(control_id)) + + if reviewed is None: + table.add_row( + str(i), + f"[red]{control_id}[/red]", + "[red]never[/red]", + "[red]—[/red]", + str(refs), + ) + else: + age = (today - reviewed).days + style = "yellow" if age > 90 else "" + id_str = f"[{style}]{control_id}[/{style}]" if style else control_id + date_str = f"[{style}]{reviewed}[/{style}]" if style else str(reviewed) + age_str = f"[{style}]{age}[/{style}]" if style else str(age) + table.add_row(str(i), id_str, date_str, age_str, str(refs)) + + remaining = len(entries) - limit + if remaining > 0: + table.add_row("", f"[dim]… {remaining} more[/dim]", "", "", "") + + console.print(table) + console.print() + + # --- Most stale control detail --- + if not entries: + console.print("[bold red]No controls found![/bold red]") + raise typer.Exit(code=1) + + top_ctrl, top_reviewed = entries[0] + control_id = top_ctrl["id"] + refs = find_references(control_id) + + detail_lines = [ + f"[bold cyan]{control_id}[/bold cyan]", + f"[dim]Last reviewed: {top_reviewed or 'never'}[/dim]", + "", + f"[bold]Description:[/bold] {top_ctrl.get('description', '').strip()}", + ] + notes = top_ctrl.get("notes", "").strip() + if notes: + detail_lines.append(f"[bold]Notes:[/bold] {notes}") + + console.print( + Panel( + "\n".join(detail_lines), + title="[bold]Up For Review[/bold]", + border_style="green", + ) + ) + console.print() + + # --- References --- + if refs: + ref_table = Table( + show_header=True, header_style="bold", title="References in codebase" + ) + ref_table.add_column("File", style="cyan") + ref_table.add_column("Line") + + for ref in refs: + # rg output: file:line:content + parts = ref.split(":", 2) + if len(parts) >= 3: + filepath = parts[0].replace(str(REPO_ROOT) + "/", "") + line_no = parts[1] + content = parts[2].strip() + ref_table.add_row(f"{filepath}:{line_no}", content) + else: + ref_table.add_row(ref, "") + + console.print(ref_table) + else: + console.print( + f"[yellow]No references to '{control_id}' found in the codebase.[/yellow]" + ) + console.print() + + # --- Review checklist --- + checklist = [ + "[bold]Verification:[/bold]\n", + f"• {notes}\n" if notes else "", + "\n[bold]Review each reference:[/bold]\n", + "• For each muted finding referencing this control, confirm:\n", + " 1. The risk the original check guards against\n", + " 2. That this control actually mitigates that risk\n", + " 3. That the control is still in effect (not degraded or bypassed)\n", + "\n[bold]After review:[/bold]\n", + f"• Update compensating-controls.yaml: [cyan]last-reviewed: {today}[/cyan]\n", + "• If the control is no longer valid, either:\n", + " - Fix the underlying finding and remove the mute, or\n", + " - Document a new/updated compensating control\n", + "• Commit the change", + ] + + console.print( + Panel( + "".join(checklist), + title="[bold yellow]Review Guidance[/bold yellow]", + border_style="yellow", + ) + ) + + +if __name__ == "__main__": + typer.run(main) diff --git a/mise-tasks/review-compliance-reports b/mise-tasks/review-compliance-reports index 24d2afc..bcbe090 100755 --- a/mise-tasks/review-compliance-reports +++ b/mise-tasks/review-compliance-reports @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["rich==15.0.0", "typer==0.26.2", "pyyaml==6.0.3"] +# dependencies = ["rich==15.0.0", "typer==0.25.0", "pyyaml==6.0.3"] # /// #MISE description="Summarize the latest Prowler and Kingfisher compliance reports from sifaka" #USAGE flag "--full" help="Show all unmuted failures, not just new ones" @@ -143,10 +143,7 @@ def _kubectl(args: str, timeout: int = 15) -> subprocess.CompletedProcess: def run_node_verification(console: Console) -> None: """Verify node-level conditions that Prowler reports as MANUAL. - Prowler runs inside a pod and can't evaluate kubelet file permissions, - kubelet config arguments, etcd CA separation, or cluster-admin RBAC - bindings. We SSH into the minikube node and check each condition here, - failing loudly if any deviates from expected values. + Compensating control: node-config-automated-verification """ checks: list[tuple[str, str, bool]] = [] # (name, detail, passed) @@ -281,7 +278,7 @@ def run_node_verification(console: Console) -> None: table = Table( show_header=True, header_style="bold", - title="Node Verification (out-of-band checks for MANUAL findings)", + title="Node Verification (CC: node-config-automated-verification)", ) table.add_column("Check") table.add_column("Detail") @@ -531,8 +528,8 @@ def summarize_report( Panel( f"[bold yellow]{len(latest['unmuted'])} unmuted failure(s) " f"need triage.[/bold yellow]\n\n" - "For each: remediate, or add a Resource entry to the " - "matching check in argocd/manifests/prowler/mutelist/.", + "For each: remediate or mute " + "(add to mutelist + compensating control).", title=f"{label} Verdict", border_style="yellow", ) @@ -656,6 +653,7 @@ def main( ) # --- Node-level MANUAL check verification --- + # Compensating control: node-config-automated-verification # These checks verify conditions Prowler reports as MANUAL because it # runs inside a pod and cannot evaluate them directly. run_node_verification(console) diff --git a/mise-tasks/runner-logs b/mise-tasks/runner-logs index 0d3028b..9c988ee 100755 --- a/mise-tasks/runner-logs +++ b/mise-tasks/runner-logs @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="List recent Forgejo Actions runs or fetch logs for a specific job" #USAGE arg "[run_number]" help="Run number to show jobs for (omit to list recent runs)" @@ -229,35 +229,12 @@ def fetch_log(run_number: int, job_index: int, repo: str, token: str) -> None: hex_prefix = f"{task_id & 0xff:02x}" log_path = f"~/forgejo/data/actions_log/{repo}/{hex_prefix}/{task_id}.log.zst" - # indri's login shell (fish) silently swallows SSH exit codes, so we can't - # rely on returncode. zstdcat itself also exits 0 with a "can't stat ... - # -- ignored" stderr message when the file is missing. Detect missing logs - # by running `test -f` over SSH and parsing the marker line from stdout. - probe = subprocess.run( - ["ssh", "indri", f"test -f {log_path} && echo EXISTS || echo MISSING"], - capture_output=True, - text=True, - ) - marker = probe.stdout.strip().splitlines()[-1] if probe.stdout.strip() else "" - if marker != "EXISTS": - typer.echo( - f"Error: log not found for run #{run_number} job {job_index} (task {task_id})", - err=True, - ) - typer.echo(f"Path: indri:{log_path}", err=True) - typer.echo( - "The runner may have crashed before uploading its log buffer " - "(action_task.log_in_storage = 0).", - err=True, - ) - raise typer.Exit(1) - result = subprocess.run( ["ssh", "indri", f"zstdcat {log_path}"], capture_output=True, text=True, ) - if result.returncode != 0 or not result.stdout: + if result.returncode != 0: typer.echo( f"Error: could not read log for run #{run_number} job {job_index} (task {task_id})", err=True, diff --git a/mise-tasks/service-review b/mise-tasks/service-review index f83b104..2d50e0b 100755 --- a/mise-tasks/service-review +++ b/mise-tasks/service-review @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["pyyaml==6.0.3", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Review the most stale service for version freshness" #USAGE flag "--limit " default="15" help="Number of services to show in the table" diff --git a/mise-tasks/spork-create b/mise-tasks/spork-create index 3f18563..92f4e5c 100755 --- a/mise-tasks/spork-create +++ b/mise-tasks/spork-create @@ -1,7 +1,7 @@ #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.12" -# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.26.2"] +# dependencies = ["httpx==0.28.1", "rich==15.0.0", "typer==0.25.0"] # /// #MISE description="Create a spork (floating-branch soft-fork) of a mirrored upstream project" #USAGE arg "" help="Repository name in the mirrors/ org on forge (e.g. kingfisher)" diff --git a/nixos/ringtail/configuration.nix b/nixos/ringtail/configuration.nix index bc893d5..2cc5280 100644 --- a/nixos/ringtail/configuration.nix +++ b/nixos/ringtail/configuration.nix @@ -16,26 +16,8 @@ in systemd.tpm2.enable = false; # Networking - # Wired interface (enp5s0) uses a static IP configured by NixOS scripted - # networking; NetworkManager is left enabled for the wireless fallback only. networking.hostName = "ringtail"; - networking.networkmanager = { - enable = true; - unmanaged = [ "interface-name:enp5s0" ]; - }; - networking.useDHCP = false; - networking.interfaces.enp5s0.ipv4.addresses = [{ - address = "192.168.1.21"; - prefixLength = 24; - }]; - networking.defaultGateway = "192.168.1.1"; - networking.nameservers = [ "192.168.1.1" "1.1.1.1" ]; - - # K3s pod networking and Tailscale tunnel routing require IP forwarding. - # NixOS leaves this off by default; previously it was being enabled - # implicitly by NM/scripted-DHCP setup, but with static networking we - # have to set it explicitly. - boot.kernel.sysctl."net.ipv4.ip_forward" = 1; + networking.networkmanager.enable = true; # Time zone time.timeZone = "America/Los_Angeles"; @@ -337,12 +319,7 @@ in output = { "DP-1" = { mode = "2560x1440@165Hz"; - # VRR off: the OMEN 27i IPS pumps gamma/brightness when the panel - # refresh swings into its low VRR range (e.g. low-fps game - # cutscenes), producing a ~20Hz flicker that compounds over a long - # session until a reboot. Fixed refresh at 165Hz eliminates it. - # If you want VRR back, cap in-game fps so refresh never dips low. - adaptive_sync = "off"; + adaptive_sync = "on"; bg = "~/.config/sway/wallpaper.jpg fill"; }; }; @@ -614,22 +591,6 @@ in AllowSuspendThenHibernate=no ''; - # Cap systemd-coredump. Wine/Proton games (Diablo IV, etc.) segfault - # regularly and dump multi-GB cores; with the stock (effectively unbounded) - # limits, systemd-coredump then spends minutes streaming and compressing the - # dump to disk — e.g. a single D4 crash produced a 4.6G core, read 13.7G and - # wrote 17.4G, pinning the CPU and locking up the desktop for ~3.5 minutes. - # Those cores are useless anyway: Nix .so files carry no build-id, so no - # backtrace can be generated. Capping uncompressed size at 1G makes oversized - # cores get logged-but-skipped (the kernel stops dumping once we stop reading) - # while real service cores (well under 1G) are still captured. MaxUse bounds - # the on-disk store so frequent game crashes can't accumulate (was at 8.6G). - systemd.coredump.extraConfig = '' - ProcessSizeMax=1G - ExternalSizeMax=1G - MaxUse=2G - ''; - # NixOS release system.stateVersion = "25.11"; } diff --git a/nixos/ringtail/flake.lock b/nixos/ringtail/flake.lock index bb60501..d6a85dc 100644 --- a/nixos/ringtail/flake.lock +++ b/nixos/ringtail/flake.lock @@ -7,11 +7,11 @@ ] }, "locked": { - "lastModified": 1780290312, - "narHash": "sha256-eTAlX0CwgB84Ts3GaBd944A3DRXVMzgA0EqroZBISUo=", + "lastModified": 1776613567, + "narHash": "sha256-gC9Cp5ibBmGD5awCA9z7xy6MW6iJufhazTYJOiGlCUI=", "owner": "nix-community", "repo": "disko", - "rev": "115e5211780054d8a890b41f0b7734cafad54dfe", + "rev": "32f4236bfc141ae930b5ba2fb604f561fed5219d", "type": "github" }, "original": { @@ -27,11 +27,11 @@ ] }, "locked": { - "lastModified": 1779506708, - "narHash": "sha256-QOD/CNm196nCJRheux/URi4/HE66fthdOMqCJoPP1Y0=", + "lastModified": 1775425411, + "narHash": "sha256-KY6HsebJHEe5nHOWP7ur09mb0drGxYSzE3rQxy62rJo=", "owner": "nix-community", "repo": "home-manager", - "rev": "3ee51fbdac8c8bdfe1e7e1fcaba6520a563f394f", + "rev": "0d02ec1d0a05f88ef9e74b516842900c41f0f2fe", "type": "github" }, "original": { @@ -43,11 +43,11 @@ }, "nixpkgs": { "locked": { - "lastModified": 1779796641, - "narHash": "sha256-ZsIrKmhp4vbBXoXXmR/tBXA/UCsAQiJL9vsgZEduhVY=", + "lastModified": 1777428379, + "narHash": "sha256-ypxFOeDz+CqADEQNL72haqGjvZQdBR5Vc7pyx2JDttI=", "owner": "NixOS", "repo": "nixpkgs", - "rev": "25f538306313eae3927264466c70d7001dcea1df", + "rev": "755f5aa91337890c432639c60b6064bb7fe67769", "type": "github" }, "original": { diff --git a/nixos/ringtail/gaming.nix b/nixos/ringtail/gaming.nix index 7c00378..d84ef9b 100644 --- a/nixos/ringtail/gaming.nix +++ b/nixos/ringtail/gaming.nix @@ -5,7 +5,6 @@ programs.steam = { enable = true; dedicatedServer.openFirewall = true; - extraCompatPackages = [ pkgs.proton-ge-bin ]; }; # Proton Experimental ships an accessibility bridge (xalia) that hangs during @@ -13,23 +12,6 @@ # so disable xalia globally to avoid wedging iscriptevaluator.exe. environment.sessionVariables.PROTON_USE_XALIA = "0"; - # Subnautica 2 pre-launch wrapper. SN2 (UE5) writes Saved/running.dat as a - # "currently running" lockfile. If the prior session exited uncleanly (SIGKILL - # via Steam's Stop button, crash, etc.), the file persists and on next launch - # SN2 pops up an invisible (0x0-sized) Error dialog ("Your game might not have - # exited correctly last time...") that the GameThread blocks on forever — - # observable only as a black screen with a spinning loader. This wrapper - # removes the stale lockfiles before exec'ing the actual game command. - # Use as Steam launch option for Subnautica 2: - # sn2-prelaunch %command% - environment.systemPackages = [ - (pkgs.writeShellScriptBin "sn2-prelaunch" '' - saved="/mnt/games/SteamLibrary/steamapps/compatdata/1962700/pfx/drive_c/users/steamuser/AppData/Local/Subnautica2/Saved" - rm -f "$saved/running.dat" "$saved/beforelobby.dat" - exec "$@" - '') - ]; - # Gamescope — micro-compositor for game fullscreen/resolution management. # Use as Steam launch option: gamescope -W 2560 -H 1440 -f -- %command% programs.gamescope = { diff --git a/prek.toml b/prek.toml index 2c66b82..add7799 100644 --- a/prek.toml +++ b/prek.toml @@ -28,7 +28,7 @@ hooks = [{ id = "check-yaml", args = ["--unsafe"] }] # Secret detection (running both tools in parallel to compare coverage) [[repos]] repo = "https://github.com/trufflesecurity/trufflehog" -rev = "37b77001d0174ebec2fcca2bd83ff83a6d45a3ab" # v3.95.3 +rev = "17456f8c7d042d8c82c9a8ca9e937231f9f42e26" # v3.95.2 hooks = [ { id = "trufflehog", entry = "trufflehog git file://. --since-commit HEAD --no-verification --fail", stages = [ "pre-commit", @@ -38,7 +38,7 @@ hooks = [ [[repos]] repo = "https://github.com/mongodb/kingfisher" -rev = "6f560103cc6ea082ef4b80a9098e3f3111afb8bc" # v1.101.0 +rev = "9ddec4ab8b53653d4941e6b3fd4ff602ce91d81b" # v1.97.0 hooks = [ { id = "kingfisher", args = [ "scan", @@ -69,12 +69,12 @@ name = "ansible-lint" entry = "env ANSIBLE_ROLES_PATH=ansible/roles ansible-lint" language = "python" files = "^ansible/" -additional_dependencies = ["ansible-lint==26.4.0", "ansible-core==2.21.0"] +additional_dependencies = ["ansible-lint==26.4.0", "ansible-core==2.20.5"] # Python - ruff for linting and formatting [[repos]] repo = "https://github.com/astral-sh/ruff-pre-commit" -rev = "0c7b6c989466a93942def1f84baf36ddfcd60c83" # v0.15.14 +rev = "6fec9b7edb08fd9989088709d864a7826dc74e80" # v0.15.12 hooks = [{ id = "ruff", args = ["--fix"] }, { id = "ruff-format" }] # Python - ty type checker diff --git a/service-versions.yaml b/service-versions.yaml index 866c687..74d467e 100644 --- a/service-versions.yaml +++ b/service-versions.yaml @@ -46,8 +46,8 @@ services: - name: shower type: argocd - last-reviewed: 2026-05-15 - current-version: "1.1.3" + last-reviewed: 2026-05-10 + current-version: "1.0.2" upstream-source: https://forge.eblu.me/eblume/adelaide-baby-shower-app notes: | Django app for Adelaide / Heidi / Addie's baby shower. Wheel @@ -56,8 +56,8 @@ services: - name: nvidia-device-plugin type: argocd - last-reviewed: 2026-06-04 - current-version: "v0.19.2" + last-reviewed: 2026-03-27 + current-version: "v0.19.0" upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases notes: DaemonSet + RuntimeClass on ringtail for GPU workloads @@ -146,26 +146,22 @@ services: - name: valkey type: argocd - last-reviewed: 2026-05-28 - current-version: "8.1.7" - upstream-source: https://github.com/valkey-io/valkey/releases + last-reviewed: 2026-05-01 + current-version: "8.1.6-r0" + upstream-source: https://pkgs.alpinelinux.org/package/v3.22/community/aarch64/valkey notes: >- - Dual-build valkey image: container.py builds Alpine 3.22 + apk valkey - (arm64, indri) for paperless; default.nix builds via nixpkgs (amd64, - ringtail) for immich-ringtail. Both track upstream valkey 8.1.x; Alpine - 3.22 currently ships 8.1.7-r0 and nixpkgs valkey is 8.1.7. Alpine 3.23 - jumps to 9.0. Distinct from authentik-redis (nix-built Redis + Shared Alpine-built valkey image, used as a sidecar/cache by paperless + (sidecar) and immich (separate Deployment). Mirrors the upstream + docker.io/valkey/valkey:8.1-alpine. Pinned to Alpine 3.22 for valkey 8.1.x; + Alpine 3.23 jumps to 9.0. Distinct from authentik-redis (nix-built Redis 8.x) which has its own entry. - name: external-secrets type: argocd - last-reviewed: 2026-06-04 + last-reviewed: 2026-03-25 current-version: "v2.2.0" upstream-source: https://github.com/external-secrets/external-secrets/releases - notes: >- - Static kustomize manifests rendered from upstream Helm chart. Controller - image is locally built from the forge mirror via containers/external-secrets/container.py - (single all_providers static Go binary). + notes: Static kustomize manifests rendered from upstream Helm chart - name: 1password-connect type: argocd @@ -225,17 +221,9 @@ services: - name: teslamate type: argocd - last-reviewed: "2026-06-03" + last-reviewed: 2026-04-14 current-version: "v3.0.0" upstream-source: https://github.com/teslamate-org/teslamate/releases - notes: >- - Tesla data logger. Container ported from Dagger (container.py) to Nix - (containers/teslamate/default.nix) — a from-scratch beamPackages - mixRelease (Elixir/Phoenix release with npm-built assets), since - teslamate is not in nixpkgs. Pins erlang_27 + elixir_1_18 from the - shared nixos-unstable rev; assets via in-release npm ci + esbuild; - ex_cldr locale data pre-fetched (LOCALES env) to avoid sandbox - downloads. Version unchanged (v3.0.0). Build verified on ringtail. - name: transmission type: argocd @@ -339,36 +327,22 @@ services: - name: mealie type: argocd - last-reviewed: "2026-06-03" - current-version: "v3.16.0" + last-reviewed: 2026-03-16 + current-version: "v3.12.0" upstream-source: https://github.com/mealie-recipes/mealie/releases - notes: >- - Recipe manager. Container ported from Dockerfile to Nix - (containers/mealie/default.nix wraps nixpkgs mealie from a pinned - nixos-unstable; single gunicorn process, SQLite on the mealie-data - PVC). Bumped v3.12.0 -> v3.16.0 as part of the port (the deferred - upgrade). Breaking-change review v3.13-v3.16: no schema breaking - changes, SQLite auto-migrates forward via init_db; notable items are - minor (OIDC missing-claims log -> DEBUG, NLP parser uses user-defined - units, Nuxt 3->4 frontend, new Announcements feature, path-traversal - patches). Source PVC retained for rollback. Build verified on ringtail. + notes: Recipe manager; built from source via forge mirror - name: paperless type: argocd - last-reviewed: "2026-06-03" - current-version: "v2.20.15" + last-reviewed: "2026-04-08" + current-version: "v2.20.13" upstream-source: https://github.com/paperless-ngx/paperless-ngx/releases - notes: >- - Document management. Container ported from Dockerfile to Nix - (containers/paperless/default.nix wraps nixpkgs paperless-ngx from a - pinned nixos-unstable). Runs as web/worker/beat/consumer containers on - ringtail (multi-process; no s6). Bumped v2.20.13 -> v2.20.15 (the - unstable package version, same-minor patch) as part of the port. + notes: Document management; built from source via forge mirror - name: unpoller type: argocd - last-reviewed: 2026-05-28 - current-version: "v3.2.0" + last-reviewed: 2026-03-16 + current-version: "v2.34.0" upstream-source: https://github.com/unpoller/unpoller/releases notes: UniFi metrics exporter for Prometheus @@ -414,23 +388,6 @@ services: upstream-source: https://github.com/caddyserver/caddy/releases notes: Built from source with Gandi DNS and Layer 4 plugins - - name: heph - type: ansible - last-reviewed: 2026-06-05 - current-version: "v1.2.1" - upstream-source: https://forge.eblu.me/eblume/hephaestus/releases - notes: >- - hephaestus task/context sync hub on indri (server-mode launchagent, - ansible/roles/heph; cargo-built from the forge). SELF-UPDATING: hephd - polls the forge for newer releases every 10 min and rebuilds + restarts - itself, so the running version drifts AHEAD of the ansible heph_version - pin. current-version here is the last observed/deployed tag, not a hard - pin — verify the live version via `curl https://heph.ops.eblu.me/config` - is served (hub up) and the hub log's `current=` line. Reconciling this - self-update vs IaC-pin drift is tracked in the heph "Hephaestus" project: - "Reconcile hephd self-update with ansible-pinned version (drift on indri - hub)" (node 01KTBXWT6XTHNDH92CVJY88E5K). - - name: borgmatic type: ansible last-reviewed: 2026-04-15 diff --git a/src/blumeops/main.py b/src/blumeops/main.py index 9bbd12f..94b932b 100644 --- a/src/blumeops/main.py +++ b/src/blumeops/main.py @@ -80,10 +80,6 @@ class Blumeops: "git", "clone", "--depth=1", - # Pin to last v4 release. v5.0.0 restructured config - # layout (.quartz/plugins, ../quartz imports) and breaks - # our quartz.config.ts/quartz.layout.ts. See changelog. - "--branch=v4.5.2", "https://github.com/jackyzha0/quartz.git", "/tmp/quartz", ]