The kind=static branch added in #342 put handle_errors inside the
@host handle{} block. handle_errors is a top-level site-block directive,
not an ordered HTTP handler, so Caddy refuses to load the config:
parsing caddyfile tokens for 'handle': directive 'handle_errors'
is not an ordered HTTP handler
This crash-loops the whole reverse proxy and takes down every
*.ops.eblu.me service. Tripped today during the live cv/docs cutover.
Fix: drop handle_errors and append /404.html as the final try_files
candidate. The 404 page is served with status 200 instead of 404, but
that's acceptable for a human-facing curated 404 — the page renders
correctly. Documented inline.
The running Caddy on indri already has the fixed config (deployed
manually during the cutover); this lands the fix in main so future
provision-indri --tags caddy runs don't re-break it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down.
## What's in this commit
- `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/`
- `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`)
- `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy`
- `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit
- `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi)
- Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment
## What is not in this commit
The dead artifacts. After PR review and successful cutover, a follow-up commit deletes:
- `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/`
- `containers/cv/`, `containers/quartz/`
- `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check`
- bumps `docs.current-version` in `service-versions.yaml` to the release tag
## Cutover plan (manual, from gilbert, after review)
1. **Take down old:**
- Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade`
- Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty)
- Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail
2. **Bring up new:**
- `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch)
- `mise run provision-indri -- --tags cv,docs,caddy`
- `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"`
3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md`
4. **Cleanup commit + merge.**
Total expected downtime: minutes (not the few-hour budget you authorized).
## Test plan
- [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean
- [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread
- [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200
- [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment`
- [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page
- [ ] `mise run services-check` clean
- [ ] `mise run service-review --type ansible` shows cv and docs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #342
## Summary
Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.
## What changed
- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.
## Already deployed (live on indri)
- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 ✅
- `mise run docs-mikado` works again ✅
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed
## Test plan
- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)
## Context
This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.
Reviewed-on: #341
LaunchAgents now call borgmatic directly at its mise-installed path
instead of routing through `mise x`, which triggered macOS TCC
permission dialogs (e.g. "mise wants to access Documents") that hung
headless sessions and caused backup failures.
Also adds `mise install` to the ansible role so borgmatic installation
is fully managed, and pins the version in both mise.toml and the role
defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Discovered during DR that paperless was the only service DB not backed
up by borgmatic. Uses same blumeops-pg cluster on port 5432.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The storage-provisioner is a bare Pod with no controller. If the node
restarts via Docker Desktop (rather than `minikube start`), kubelet
restores static pods but bare pods are lost. Detect this and re-run
`minikube start` to restore addons.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Add `nixpkgs-services` flake input pinned to a specific nixpkgs commit, with an overlay that pulls `forgejo-runner`, `snowflake`, and `k3s` from it instead of the rolling `nixpkgs`
- Dagger `flake-update` pipeline now excludes `nixpkgs-services` via `--exclude`
- Fix stale nix-container-builder version in service-versions.yaml (was 12.6.4, actually running 12.7.2)
- Add k3s and minikube to service-versions.yaml tracking
- Document the pinning approach in review-services how-to and ringtail reference
## Motivation
During service review, discovered that flake updates had silently upgraded forgejo-runner from 12.6.4 → 12.7.2 without updating service-versions.yaml. This "sneak-in upgrade" bypasses the service review process. The overlay ensures these three services only change versions deliberately.
## Test plan
- [ ] Verify `nix flake update` from `nixos/ringtail/` does not change `nixpkgs-services` lock entry
- [ ] Verify `mise run provision-ringtail` builds successfully with the overlay
- [ ] Confirm running service versions unchanged after deploy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #321
Restrict backup to library/ and upload/ only (skip regenerable encoded-video/,
thumbs/, backups/). Add SSH ServerAliveInterval to prevent broken pipe on long
transfers, and checkpoint_interval so interrupted backups save progress.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Adds a second borgmatic config (`photos.yaml`) that backs up `/Volumes/photos` (sifaka SMB mount, ~128 GB) to a dedicated BorgBase repo (`immich-photos`), running daily at 4 AM
- Separate launchd agent (`mcquack.eblume.borgmatic-photos`) so photo backups run independently from the main backup
- Refactors `borgmatic_metrics` script to support multiple repos with a `repo` Prometheus label
- Updates Grafana "Borg Backups" dashboard with a `repo` template variable so you can filter/compare repos
- Docs updated: `backups.md`, `borgmatic.md`
## Prerequisites (manual)
- [x] Create `immich-photos` repo on BorgBase with same SSH key
- [ ] Upgrade BorgBase plan to Small ($24/yr) if currently on free tier (128 GB exceeds 10 GB limit)
- [ ] After deploy: `borg init` the new repo (borgmatic does this automatically on first run)
## Test plan
- [ ] Dry run: `mise run provision-indri -- --check --diff --tags borgmatic,borgmatic_metrics`
- [ ] Deploy borgmatic role and verify both configs deployed
- [ ] Run `borgmatic --config ~/.config/borgmatic/photos.yaml create --verbosity 1` manually for first backup (will take hours)
- [ ] Verify metrics script collects from both repos: `~/.local/bin/borgmatic-metrics && cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom`
- [ ] Sync grafana-config in ArgoCD and verify dashboard repo selector works
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #315
## Summary
- Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups
- Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups
- For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433
- Update backup docs to reflect all four CNPG databases + mealie SQLite
## Deploy plan
Deploy order matters — k8s resources must exist before ansible can route to them:
1. **ArgoCD (databases app):** sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service
```
argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups
argocd app sync blumeops-pg
```
2. **Wait** for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve
3. **Ansible (caddy):** deploy Caddy L4 route for port 5433
```
mise run provision-indri -- --tags caddy
```
4. **Ansible (borgmatic):** deploy updated config and .pgpass
```
mise run provision-indri -- --tags borgmatic
```
5. **Verify:** trigger a manual borgmatic run and check all four pg_dump streams succeed
```
borgmatic --verbosity 1 2>&1 | grep -E '(Dumping|ERROR)'
```
## Test plan
- [x] `kubectl kustomize` builds cleanly
- [x] `ansible --check --diff` for borgmatic and caddy show expected changes
- [ ] ArgoCD sync succeeds for databases app
- [ ] `immich-pg.tail8d86e.ts.net` resolves
- [ ] `pg.ops.eblu.me:5433` accepts connections
- [ ] `borgmatic --verbosity 1` dumps all four databases without errors
Reviewed-on: #314
The Mealie SQLite dump hook used `minikube-indri` (the context name on
gilbert), but on indri itself the context is just `minikube`. This caused
the before_backup hook to fail, aborting all backups since the hook was added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Caddy v2.11 (#7454) auto-rewrites the Host header to match the
upstream address for HTTPS backends. This causes services behind
Tailscale Ingress to see *.tail8d86e.ts.net instead of *.ops.eblu.me,
breaking Authentik OAuth flows, Homepage host validation, and other
services that check the Host header.
Only apply header_up for HTTPS backends (Tailscale Ingress); HTTP
backends (forge, registry, jellyfin, sifaka) are unaffected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade zot OCI registry from v2.1.13 to v2.1.15 on indri
- Addresses CVE-2025-30204 (golang-jwt memory) and open redirect via callback_ui
- No config template changes needed (externalUrl is auto-allowlisted)
- Requires Go 1.25.7 (bump from 1.25.6 via mise)
## Data Safety
- Data directory ~/erichblume/zot is NOT touched during build or deploy
- No schema migrations in v2.1.14 or v2.1.15
- Storage format remains OCI spec 1.1.0
## Deployment Steps
- [ ] SSH to indri: bump Go to 1.25.7 via `mise use go@1.25.7`
- [ ] Fetch and checkout v2.1.15 in ~/code/3rd/zot
- [ ] Build: `mise x -- make binary`
- [ ] Restart LaunchAgent
- [ ] Verify: `curl -s http://localhost:5050/v2/` returns 200
- [ ] Verify: `curl -s https://registry.ops.eblu.me/v2/_catalog` lists repos
- [ ] Verify: `mise run services-check`
Reviewed-on: #293
## Summary
C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.
### Mikado Graph
```
deploy-jobsync (goal)
├── build-jobsync-container
│ └── mirror-jobsync
└── integrate-jobsync-ollama
```
### What is JobSync?
Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.
### Key Decisions
- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed
### Current Status
Planning phase — cards committed, ready for review before implementation begins.
Reviewed-on: #288
## Summary
- Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs
- Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests)
- Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki
- Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with:
- **Kubernetes Clusters** dashboard — multi-cluster with `cluster` and `namespace` template variables
- **Ringtail (k3s)** dashboard — dedicated ringtail view with GPU usage panels
## Deployment and Testing
1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`)
2. Sync `prometheus` → verify `cluster` label on scraped metrics
3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs
4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs
5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail
6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}`
7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values
8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods
## Notes
- Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later
- DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve
- The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270
## Summary
- **mirror-create**: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time
- **mirror-update-pats**: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run`
- **app.ini.j2**: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility)
- **manage-forgejo-mirrors**: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure
## Context
GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors.
## Deployment and Testing
- [x] `mirror-update-pats` dry-run verified (23 mirrors detected)
- [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri
- [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped)
- [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config
- [ ] Trigger a manual mirror sync and verify success in Forgejo UI
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269
## Summary
- Widen `repo-creds-forge` URL prefix from `/eblume/` to host-wide `/` so it matches repos in all forge orgs (fixes `mirrors/` repos not getting SSH credentials)
- Update 8 ArgoCD app definitions from `eblume/<mirror>` → `mirrors/<mirror>` (immich-charts, cloudnative-pg-charts, external-secrets, connect-helm-charts)
- Fix stale alloy clone comment in Ansible defaults
- Bump immich v2.5.2 → v2.5.6 (bug-fix patches only)
- Update ArgoCD README bootstrap command and credential docs
## Context
Mirrors were migrated from `forge.ops.eblu.me/eblume/` to `forge.ops.eblu.me/mirrors/` in commit `cd57814`. Container Dockerfiles and image tags were updated, but ArgoCD app definitions and the repo credential template were missed, causing `ComparisonError` on apps that source Helm charts from mirrored repos.
## Deployment
1. Sync the ArgoCD `argocd` app first (picks up the widened credential template)
2. Sync the `apps` app (picks up new repo URLs for all 8 apps)
3. Verify immich resolves its ComparisonError: `argocd app get immich`
4. Sync immich to deploy v2.5.6: `argocd app sync immich`
5. Spot-check: `argocd app get external-secrets`, `argocd app get cloudnative-pg`, `argocd app get 1password-connect`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/266
Deploy branding.xml with a "Sign in with Authentik" button in the
login disclaimer. Local password login remains available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add accessControl.metrics.users with empty string to allow
unauthenticated Prometheus/Alloy scraping. Zot represents
anonymous users with an empty username internally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Enable OIDC + API key authentication on zot registry with three-tier accessControl
- `anonymousPolicy: ["read"]` — anyone can pull
- `artifact-workloads` group: `["read", "create"]` — CI push, no overwrite/delete
- `admins` group: `["read", "create", "update", "delete"]` — break-glass
- Wire both CI push paths (Dagger and Nix/skopeo) with `ZOT_CI_API_KEY` credentials
- Add `artifact-workloads` PolicyBinding in Authentik blueprint for zot app access
- Add `ZOT_CI_API_KEY` to Forgejo Actions secrets via existing ansible role
Completes the `wire-ci-registry-auth` and `harden-zot-registry` Mikado cards.
## Manual Deployment Steps (after merge)
1. Deploy Authentik blueprint: `argocd app sync authentik`
2. In Authentik admin UI: set a password for the `zot-ci` service account
3. Deploy zot config: `mise run provision-indri -- --tags zot`
4. Log in to `https://registry.ops.eblu.me` as `zot-ci` via OIDC → generate API key
5. Store API key in 1Password as `zot-ci-apikey` in blumeops vault
6. Sync Forgejo secrets: `mise run provision-indri -- --tags forgejo_actions_secrets`
7. Trigger a test container build to verify CI push
8. Verify anonymous pull: `curl -sf https://registry.ops.eblu.me/v2/_catalog`
## Uncertainties
- **Zot `accessControl` group matching with OIDC:** Groups from Authentik's `profile` scope claim should map to zot policy groups, but the exact claim-to-group matching needs runtime verification
- **`http.auth.apikey: true`:** This config key is documented but needs verification against the specific zot version built from source on indri
- **API key permissions:** Need to confirm zot API keys inherit the generating user's group for accessControl evaluation
## Test Plan
- [ ] `mise run provision-indri -- --check --diff --tags zot` shows expected config changes
- [ ] Anonymous pull works after deploy
- [ ] Unauthenticated push fails (401)
- [ ] OIDC browser login redirects to Authentik and back
- [ ] API key push works after key generation
- [ ] CI push succeeds with both Dagger and skopeo paths
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/237
## Summary
- Add Authentik blueprint (`zot.yaml`) with OAuth2 provider, application, `artifact-workloads` group, and `zot-ci` service account
- Wire `zot-client-secret` through ExternalSecret → worker Deployment env var → blueprint `!Env`
- Add Ansible pre_task to fetch OIDC secret from 1Password (item ID `oor7os5kapczgpbwv7obkca4y4`)
- Add `oidc-credentials.json.j2` template and deploy task in zot role (with `when` guard)
## Manual Steps Required Before Deploy
1. Generate client secret: `openssl rand -hex 32`
2. Store in 1Password: add field `zot-client-secret` to "Authentik (blumeops)" item in vault `blumeops`
## What This Does NOT Do
- Does NOT modify `config.json.j2` (that's the root goal `harden-zot-registry`)
- Does NOT wire CI auth (that's `wire-ci-registry-auth`)
- Does NOT set service account password or API keys (manual post-deploy)
## Verification
After ArgoCD sync:
- [ ] Authentik admin UI shows "Zot Registry" application
- [ ] OIDC discovery at `https://authentik.ops.eblu.me/application/o/zot/.well-known/openid-configuration` returns valid JSON
- [ ] Blueprint status is `successful`
- [ ] `artifact-workloads` group exists with `zot-ci` service account
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/236
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.
This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.
## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
- `build-authentik-container` — Build Nix container image
- `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
- `create-authentik-secrets` — Create 1Password item with credentials
## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early
## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227
## Summary
- Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired
- Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s
- Move ntfy ArgoCD app from indri minikube to ringtail k3s
- Update Frigate docs to reflect detector removal and planned RTX 4080 migration
- Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail
## Deployment
After merge:
1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps:
```
argocd app sync apps --prune
```
2. Sync new ringtail apps:
```
argocd app sync mosquitto-ringtail
argocd app sync ntfy-ringtail
```
3. Manually clean up the detector LaunchAgent on indri:
```
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
```
## Notes
- Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns)
- ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup
- Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed
- Frigate + frigate-notify will be ported to ringtail in a follow-up PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216
## Summary
- Replace `changed_when: true` with `register` + output inspection on the two 1Password secret tasks in `ringtail.yml`
- Tasks now correctly report `ok` when the secret content hasn't changed, and `changed` only when `kubectl apply` outputs `configured` or `created`
## Test plan
- [ ] Run `mise run provision-ringtail` twice — second run should show both tasks as `ok` not `changed`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/213
## Summary
Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions
runner for building containers with Nix.
### K3s cluster
- Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint)
- TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale
- Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`)
- Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access
- `kubectl` added to system packages
### 1Password Connect + External Secrets Operator
- Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`,
`external-secrets-config-ringtail`
- Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server
- Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password`
namespace in post_tasks
### Systemd Forgejo Actions runner
- Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host
- Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH
- Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env`
- Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access
### Nix container build workflow
- New `.forgejo/workflows/build-container-nix.yaml` triggers on `*-nix-v[0-9]*` tags (e.g. `nettest-nix-v1.0.0`)
- Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy`
- Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering
### Mise task updates
- `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`)
- `container-list` shows build type indicator (`[nix]` / `[dockerfile]`)
## Post-merge
1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild
2. Register k3s cluster in ArgoCD (first time only):
```fish
ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' | \
sed 's|127.0.0.1|ringtail.tail8d86e.ts.net|' > /tmp/k3s-ringtail.yaml
set -x KUBECONFIG /tmp/k3s-ringtail.yaml
argocd cluster add default --name k3s-ringtail
3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail ->
external-secrets-config-ringtail
4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder'
5. Check Forgejo admin panel for ringtail-nix-builder runner online
6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209
nixos-rebuild can dirty the tree (e.g. flake.lock updates), which
blocks the Ansible git module. Force ensures we always reset to the
upstream state.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Uses subelements loop to sync secrets across repos. Adds FORGE_TOKEN
to the cv repo for package uploads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz)
- ArgoCD app + k8s manifests (deployment, service, Tailscale ingress)
- Caddy route for `cv.ops.eblu.me`
- Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD
- Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv)
## Deployment steps (after merge)
1. `mise run container-tag-and-release cv v1.0.0`
2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`)
3. Run "Deploy CV" workflow in blumeops (default: latest)
4. `mise run provision-indri -- --tags caddy`
5. Verify at `https://cv.ops.eblu.me/`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169
## Summary
- Adds BorgBase as a second borgmatic repository for offsite backups (US region, append-only)
- SSH key managed via 1Password, deployed to indri by Ansible
- Borgmatic `ssh_command` configured to use the dedicated BorgBase key
- BorgBase host key pinned in known_hosts via Ansible
## Post-merge deployment steps
1. Provision borgmatic: `mise run provision-indri -- --tags borgmatic`
2. Initialize the BorgBase repo: `ssh indri 'mise x -- borgmatic init --encryption repokey --repository borgbase-offsite'`
3. Export and store the borg repokey: `ssh indri 'borg key export ssh://k04ljcd7@k04ljcd7.repo.borgbase.com/./repo'` → save to 1Password
4. Verify first backup: `ssh indri 'mise x -- borgmatic create --repository borgbase-offsite --verbosity 1'`
## BorgBase setup (already done)
- Account created, API token in 1Password (`borgbase` item in blumeops vault)
- SSH keypair generated, stored in 1Password, public key uploaded to BorgBase (ID: 200815)
- Repository `indri-borgmatic` created (ID: k04ljcd7, US region, append-only, 2-day alert)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/142
## Summary
- After a power loss, minikube's Docker container (host) restarts but kubelet/apiserver remain stopped
- The ansible role's status check used `--format='{{.Host}}'` which only examined the host VM state
- When host=Running but kubelet/apiserver=Stopped, the role skipped `minikube start`
- Fixed to use full `minikube status` exit code (returns non-zero when any component is unhealthy)
- Simplified all downstream conditions to use exit code instead of string matching
## Test plan
- [x] Verified the fix correctly skips `minikube start` when cluster is already fully running
- [x] Pre-commit hooks pass (ansible-lint, yamllint, etc.)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/137