Staging deployments on ringtail k3s, in parallel with the minikube apps
until per-service cutover. Each uses the Nix image built at 1d4cbbf
(paperless v2.20.15, mealie v3.16.0, teslamate v3.0.0, all -nix tags) and
points postgres at the in-cluster ringtail blumeops-pg.
- paperless: redesigned as web/worker/beat/consumer + redis in one pod
(Nix image has no s6 supervisor); media on a ringtail-suffixed NFS PV
(needs a sifaka rule for ringtail).
- mealie: single gunicorn; SQLite PVC (local-path) copied at cutover.
- teslamate: stateless; DATABASE_HOST already in-cluster, unchanged.
ArgoCD apps target ringtail (https://ringtail.tail8d86e.ts.net:6443).
Not synced yet; deploy-from-branch + cutover is the next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary
C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.
This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.
## Goal end-state
- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.
## Cards
| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |
## Key constraints
- **No data loss.** Downtime is acceptable; data loss is not. Two
surfaces matter: postgres (ML embeddings, face data — slow to
re-derive) and the library files (don't move, but NFS access from
ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
documented fallback. Either way, dry-run against a scratch
cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
would block the whole decommission, and Immich is chatty with pg
so tailnet round-trips would hurt.
## Test plan
- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
chain correctly.
- [ ] Per-card implementation cycles land separately (commit
convention enforced by hook).
Reviewed-on: #356
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: #349
Repoint the ArgoCD Application destination from minikube to ringtail and
bump the image tag to the new amd64 nix-built v1.11.0-b87f62e-nix.
Rework services.yaml for the autodiscovery shift: 11 services that
previously auto-populated via minikube Ingress annotations (ArgoCD,
Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, Navidrome,
Paperless, TeslaMate, Transmission) become explicit static entries with
their widget configs preserved. Conversely, the ringtail services that
will now auto-populate (Frigate/NVR, Authentik, Ntfy) are removed from
the static list to avoid duplicates; Ollama becomes newly visible.
Add a Content group for Immich/Kiwix/Miniflux which previously lived
under the autodiscovered "Content" group from annotations.
## Summary
Follow-up to #342. The cv and docs services are now live on indri (Caddy file_server backed by ansible-managed tarball extraction) and verified working. This PR removes the dead minikube artifacts and the tooling shims that referenced them.
## Changes
**Deletions:**
- ``argocd/apps/{cv,docs}.yaml``
- ``argocd/manifests/{cv,docs}/`` (deployment, service, ingress, pdb, kustomization)
- ``containers/{cv,quartz}/`` (Dockerfiles + start scripts)
**Tooling:**
- ``mise-tasks/container-version-check``: remove the ``quartz``→``docs`` CONTAINER_TO_SERVICE mapping (containers/quartz no longer exists)
- ``service-versions.yaml``: bump ``docs.current-version`` to ``v1.16.0`` (the blumeops docs release tag) and trim the migration-window comment
## Live state context
The argocd Applications ``cv`` and ``docs`` were already deleted from the cluster manually as part of the cutover; this PR just removes the YAML files that the ``apps`` app-of-apps was still ingesting. After merge, ``argocd app sync apps`` will reconcile and the ``apps`` Application returns to Synced.
The Caddyfile ``handle_errors`` bug that briefly crashed all ``*.ops.eblu.me`` services during cutover is fixed in a separate C0 (``2ee53fe``) on main, not here.
## Test plan
- [x] ``mise run container-version-check --all-files`` clean
- [x] ``mise run service-review --type ansible`` shows cv at 1.0.3, docs at v1.16.0
- [ ] After merge: ``argocd app sync apps`` returns clean (cv/docs entries gone, no children to reconcile)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #343
## Summary
Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.
## What changed
- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.
## Already deployed (live on indri)
- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 ✅
- `mise run docs-mikado` works again ✅
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed
## Test plan
- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)
## Context
This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.
Reviewed-on: #341
Replace the Helm chart deployment with plain kustomize manifests following
the Authentik pattern (separate deployments per component). Consolidate
the immich-storage ArgoCD app into the main immich app. Add no-helm-policy
doc establishing kustomize as the standard deployment mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Deploys MongoDB Kingfisher as a weekly CronJob on minikube-indri
- Scans all Forgejo repos (eblume + all orgs) for leaked secrets with live validation
- Produces timestamped HTML and JSON reports on sifaka NFS (`/volume1/reports/kingfisher/`)
- Forgejo API token sourced from 1Password via ExternalSecret
- Uses official `ghcr.io/mongodb/kingfisher:1.91.0` container image
- Runs Sunday 4am (after Prowler's 3am k8s scan)
## Resources
- CronJob, PV/PVC (sifaka NFS), ExternalSecret
- ArgoCD Application with manual sync + CreateNamespace
## Test plan
- [x] Sync ArgoCD `apps` app to pick up new kingfisher Application
- [x] Set `--revision feature/kingfisher-cronjob` on kingfisher app
- [x] Verify ExternalSecret creates the `kingfisher-forgejo-token` Secret
- [x] Trigger manual job: `kubectl create job --from=cronjob/kingfisher kingfisher-manual -n kingfisher --context=minikube-indri`
- [ ] Verify reports appear on sifaka at `/volume1/reports/kingfisher/`
- [ ] After merge: set `--revision main` and re-sync
Reviewed-on: #317
## Summary
- Upgrade External Secrets Operator from v1.3.2 (helm-chart-2.0.0) to v2.2.0
- Migrate from Helm chart deployment to static kustomize manifests, matching the repo's kustomize-first pattern
- Merge separate `-config` ArgoCD apps into the main operator apps (6 → 4 apps)
- Clean up Helm-specific labels (`helm.sh/chart`, `managed-by: Helm`)
- Update README example from v1beta1 to v1 API
## Breaking changes assessment
Low risk — v2.0.0 removed Alibaba and Device42 providers (we use neither). No templating changes affect us. All ExternalSecrets already use v1 API.
## Deployment steps
1. Sync CRDs first on both clusters (new CRD version)
2. Sync operator apps (now kustomize-based)
3. Verify ClusterSecretStore and all ExternalSecrets are healthy
4. Delete orphaned config apps: `argocd app delete external-secrets-config` and `-config-ringtail`
5. `mise run services-check`
Reviewed-on: #312
Remove `group: ""` from ignoreDifferences in tailscale-operator and
tailscale-operator-ringtail — ArgoCD normalizes away the empty string
field, so the live state never matches git.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Deploy UnPoller as a k8s service on indri to export UniFi controller metrics to Prometheus
- Custom-built container from forge mirror (`containers/unpoller/Dockerfile`)
- Credentials pulled from 1Password via external-secrets
- Prometheus scrape job added, docs and service-versions updated
## Test plan
- [ ] Build container: `mise run container-release unpoller v2.34.0`
- [ ] Update kustomization tag with built image tag
- [ ] Deploy from branch: `argocd app set unpoller --revision feature/unpoller && argocd app sync unpoller`
- [ ] Verify pod connects to UX7 controller (check logs)
- [ ] Confirm `unpoller` target appears in Prometheus
- [ ] Query `unifi_` metrics in Grafana
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #298
Mosquitto has been dormant since frigate-notify switched from MQTT to
webapi polling (529ba10). Tear down live infra (ArgoCD app, namespace)
and remove all manifests, service-versions entry, services-check, and
doc references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.
### Mikado Graph
```
deploy-jobsync (goal)
├── build-jobsync-container
│ └── mirror-jobsync
└── integrate-jobsync-ollama
```
### What is JobSync?
Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.
### Key Decisions
- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed
### Current Status
Planning phase — cards committed, ready for review before implementation begins.
Reviewed-on: #288
Was the only app still using https://forge.eblu.me (public proxy) for
git polling. All other apps already use the internal SSH endpoint at
forge.ops.eblu.me.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs
- Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests)
- Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki
- Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with:
- **Kubernetes Clusters** dashboard — multi-cluster with `cluster` and `namespace` template variables
- **Ringtail (k3s)** dashboard — dedicated ringtail view with GPU usage panels
## Deployment and Testing
1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`)
2. Sync `prometheus` → verify `cluster` label on scraped metrics
3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs
4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs
5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail
6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}`
7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values
8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods
## Notes
- Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later
- DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve
- The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270
## Summary
- Point ArgoCD app directly at forge-mirrored upstream repo (`mirrors/cloudnative-pg`) instead of the Helm charts repo
- Use `directory.include` to select the specific release manifest (`cnpg-1.27.1.yaml`) from the `releases/` directory
- No vendored files, no Helm — upgrades are a two-line change (`targetRevision` + `directory.include`)
- Delete unused `values.yaml` (was empty, all Helm defaults)
## Deployment and Testing
- [ ] Register mirror repo in ArgoCD: `argocd repo add ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git --ssh-private-key-path <key>`
- [ ] `argocd app set cloudnative-pg --revision feature/cnpg-direct-source && argocd app sync cloudnative-pg`
- [ ] Verify operator pod running: `kubectl get pods -n cnpg-system --context=minikube-indri`
- [ ] Verify CRDs exist: `kubectl get crd --context=minikube-indri | grep cnpg`
- [ ] Verify existing clusters healthy: `kubectl get clusters -A --context=minikube-indri`
- [ ] After merge: `argocd app set cloudnative-pg --revision main && argocd app sync cloudnative-pg`
## Notes
- The forge mirror was created via `mise run mirror-create` from `https://github.com/cloudnative-pg/cloudnative-pg.git`
- ArgoCD may need the mirror repo added to its known repositories if the credential template doesn't already match `mirrors/*`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/268
## Summary
- Widen `repo-creds-forge` URL prefix from `/eblume/` to host-wide `/` so it matches repos in all forge orgs (fixes `mirrors/` repos not getting SSH credentials)
- Update 8 ArgoCD app definitions from `eblume/<mirror>` → `mirrors/<mirror>` (immich-charts, cloudnative-pg-charts, external-secrets, connect-helm-charts)
- Fix stale alloy clone comment in Ansible defaults
- Bump immich v2.5.2 → v2.5.6 (bug-fix patches only)
- Update ArgoCD README bootstrap command and credential docs
## Context
Mirrors were migrated from `forge.ops.eblu.me/eblume/` to `forge.ops.eblu.me/mirrors/` in commit `cd57814`. Container Dockerfiles and image tags were updated, but ArgoCD app definitions and the repo credential template were missed, causing `ComparisonError` on apps that source Helm charts from mirrored repos.
## Deployment
1. Sync the ArgoCD `argocd` app first (picks up the widened credential template)
2. Sync the `apps` app (picks up new repo URLs for all 8 apps)
3. Verify immich resolves its ComparisonError: `argocd app get immich`
4. Sync immich to deploy v2.5.6: `argocd app sync immich`
5. Spot-check: `argocd app get external-secrets`, `argocd app get cloudnative-pg`, `argocd app get 1password-connect`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/266
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.
This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.
## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
- `build-authentik-container` — Build Nix container image
- `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
- `create-authentik-secrets` — Create 1Password item with credentials
## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early
## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227
## Summary
- Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired
- Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s
- Move ntfy ArgoCD app from indri minikube to ringtail k3s
- Update Frigate docs to reflect detector removal and planned RTX 4080 migration
- Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail
## Deployment
After merge:
1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps:
```
argocd app sync apps --prune
```
2. Sync new ringtail apps:
```
argocd app sync mosquitto-ringtail
argocd app sync ntfy-ringtail
```
3. Manually clean up the detector LaunchAgent on indri:
```
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
```
## Notes
- Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns)
- ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup
- Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed
- Frigate + frigate-notify will be ported to ringtail in a follow-up PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216
## Summary
Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions
runner for building containers with Nix.
### K3s cluster
- Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint)
- TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale
- Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`)
- Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access
- `kubectl` added to system packages
### 1Password Connect + External Secrets Operator
- Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`,
`external-secrets-config-ringtail`
- Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server
- Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password`
namespace in post_tasks
### Systemd Forgejo Actions runner
- Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host
- Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH
- Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env`
- Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access
### Nix container build workflow
- New `.forgejo/workflows/build-container-nix.yaml` triggers on `*-nix-v[0-9]*` tags (e.g. `nettest-nix-v1.0.0`)
- Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy`
- Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering
### Mise task updates
- `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`)
- `container-list` shows build type indicator (`[nix]` / `[dockerfile]`)
## Post-merge
1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild
2. Register k3s cluster in ArgoCD (first time only):
```fish
ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' | \
sed 's|127.0.0.1|ringtail.tail8d86e.ts.net|' > /tmp/k3s-ringtail.yaml
set -x KUBECONFIG /tmp/k3s-ringtail.yaml
argocd cluster add default --name k3s-ringtail
3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail ->
external-secrets-config-ringtail
4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder'
5. Check Forgejo admin panel for ringtail-nix-builder runner online
6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209
## Summary
- Bump External Secrets Operator Helm chart from `helm-chart-1.3.1` to `helm-chart-2.0.0` (operator v1.3.2)
- Updates both the operator app and CRDs app `targetRevision`
- No Helm values changes needed — `installCRDs`, `resources`, `webhook`, `certController` keys are unchanged
## Breaking changes in chart 2.0.0
- **Removed providers:** Alibaba and Device42 (unmaintained) — does not affect our 1Password setup
- **Templating engine v1 deprecated** — our ExternalSecrets don't set `engineVersion`, so they use the default (v2)
- **Webhook `failurePolicy`** for SecretStore is now dynamic
## Deployment
1. Sync CRDs first: `argocd app set external-secrets-crds --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets-crds`
2. Sync operator: `argocd app set external-secrets --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets`
3. Verify: `kubectl --context=minikube-indri -n external-secrets get pods`
4. After merge, set both apps back to `--revision main`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/203
## Summary
Review session covering 3 docs, plus a codebase-wide cleanup:
### Docs reviewed
- **connect-to-postgres** — verified end-to-end (psql connection tested), stamped
- **create-release-artifact-workflow** — clarified that `build-blumeops.yaml` is only a version bump example (not a packages API example)
- **deploy-k8s-service** — fixed stale repoURL (`indri:2200` → `forge.ops.eblu.me:2222`), wrong Caddy config keys (`upstream` → `backend`, added missing `host`), updated Homepage group to "Services", added Tailscale tag documentation
### Codebase cleanup
- Migrated all remaining `op item get --fields` calls to `op read` URI syntax across 7 files (docs, READMEs, YAML comments)
- Simplified the `op read` vs `op item get` guidance in CLAUDE.md
## Side findings (not addressed)
- New `immich-pg` CNPG cluster not yet documented in the postgresql reference card
## Test plan
- [x] `psql` connection to `pg.ops.eblu.me` verified
- [x] All pre-commit hooks pass
- [x] `docs-check-links`, `docs-check-index`, `docs-check-frontmatter` pass
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/191
## Summary
- nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz)
- ArgoCD app + k8s manifests (deployment, service, Tailscale ingress)
- Caddy route for `cv.ops.eblu.me`
- Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD
- Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv)
## Deployment steps (after merge)
1. `mise run container-tag-and-release cv v1.0.0`
2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`)
3. Run "Deploy CV" workflow in blumeops (default: latest)
4. `mise run provision-indri -- --tags caddy`
5. Verify at `https://cv.ops.eblu.me/`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169
## Summary
- Deploy Navidrome music streaming server to k8s
- NFS mount for music library from sifaka:/volume1/music (read-only)
- Local PVC for SQLite database and config (10Gi)
- Tailscale ingress for dj.tail8d86e.ts.net
- Caddy reverse proxy for dj.ops.eblu.me
- Homepage annotations for dashboard discovery in Media group
## Deployment and Testing
- [ ] Sync `apps` application to pick up new Application definition
- [ ] Set navidrome app to feature branch and sync
- [ ] Verify NFS mount with `kubectl exec`
- [ ] Provision Caddy for dj.ops.eblu.me
- [ ] Access https://dj.ops.eblu.me and create initial admin user
- [ ] Verify Homepage shows DJ in Media group
- [ ] Reset to main and resync after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/79
## Summary
- Fix ArgoCD icon (use `argo-cd.png` per Dashboard Icons naming)
- Add Borgmatic backup metrics widget (time since last backup, archive size)
- Add Sifaka NAS disk usage widget (used/total space)
- Create `[[grafana]]` zk card with management notes
## What didn't work
Attempted Grafana iframe embedding for a metrics panel but reverted:
- Homepage iframe widget only supports height classes, not width
- Some panels fail to load even with anonymous auth enabled
- Documented in grafana zk card for future reference
## Deployment and Testing
- [x] ArgoCD icon displays correctly
- [x] Borgmatic metrics show time since backup and archive size
- [x] NAS disk usage shows used/total bytes
- [x] Grafana reverted to authenticated-only access
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/76
## Summary
- Remove hajimari (unmaintained since Oct 2022, broken helm deps)
- Add gethomepage (28k stars, actively maintained, monthly releases)
- Migrate custom apps, bookmarks, and search config
- Enable k8s RBAC for service autodiscovery
- Configure Tailscale ingress at go.tail8d86e.ts.net
## Why the switch
Hajimari hasn't released since October 2022. The helm chart has a broken
dependency (bjw-s/common URL is 404), and unreleased code on main has bugs.
gethomepage has similar k8s autodiscovery via ingress annotations and is
very actively maintained.
## Deployment and Testing
- [ ] Delete hajimari app from ArgoCD
- [ ] Delete hajimari namespace
- [ ] Sync apps to pick up new homepage app
- [ ] Sync homepage app
- [ ] Verify go.ops.eblu.me loads
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75
- Use chart from forge.ops.eblu.me/eblume/hajimari fork
- Use custom image from registry.ops.eblu.me/blumeops/hajimari
- Enables future customizations (search auto-focus, weather widget)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fix immich app to track `main` branch instead of `feature/immich` for values
- The tailscale-operator ignoreDifferences schema drift will be fixed by syncing the `apps` app
## Deployment and Testing
- [ ] Sync `apps` to fix tailscale-operator schema drift
- [ ] Sync `immich` to pick up correct image versions from main
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/71