C2: migrate immich from minikube to ringtail (mikado chain) #356

Merged
eblume merged 20 commits from mikado/migrate-immich-to-ringtail into main 2026-05-13 16:46:20 -07:00

20 commits

Author SHA1 Message Date
b21d13fe20 C2(migrate-immich-to-ringtail): finalize chain — strip mikado frontmatter, add changelog
Immich is fully migrated off minikube-indri onto k3s-ringtail. All
six prerequisite cards plus the goal card converted to historical
documentation by removing status/branch/requires Mikado frontmatter.

Changelog fragment added at docs/changelog.d/migrate-immich-to-ringtail.infra.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:46:27 -07:00
7400807be3 C2(migrate-immich-to-ringtail): close immich-cutover-and-decommission
Sequence executed:
1. Quiesced source: immich-server + immich-machine-learning on
   minikube scaled to 0 (done in immich-pg-data-migration).
2. Deleted minikube immich-tailscale Ingress; waited for "photos"
   Tailscale device to deregister.
3. (Promote of ringtail pg was done in immich-pg-data-migration.)
4. Renamed ringtail ingress tls.hosts photos-ringtail -> photos.
5. Caddy was already pointing photos.ops.eblu.me ->
   photos.tail8d86e.ts.net so no Ansible change needed.
6. Smoke test: photos.ops.eblu.me/api/server/ping -> 200,
   /api/server/version -> {"major":2,"minor":6,"patch":3}.
7. Borgmatic continuity: added a ringtail immich-pg-tailscale
   Service (same FQDN as before, immich-pg.tail8d86e.ts.net).
   Verified borgmatic role can SELECT count(*) FROM asset over the
   tailnet (returned 12681, matches source).

Decommission:
- Deleted argocd Application "immich" with --cascade (clears
  Deployments, Services, etc. on minikube).
- Pruned blumeops-pg Application against the branch which removed
  the Cluster immich-pg, its ExternalSecret, and the old
  immich-pg-tailscale Service from minikube.
- Deleted leftover Released PVs on minikube.
- Deleted the empty immich namespace on minikube.

Did not verify minikube host memory drop directly (tailscale-ssh
re-auth was prompting at the time). Caller should confirm via
"docker stats minikube" once SSH is re-authenticated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:42:31 -07:00
7573a72318 C2(migrate-immich-to-ringtail): impl decommission minikube immich; add ringtail immich-pg tailscale service
GitOps decommission of immich + immich-pg on minikube:
- Delete argocd/apps/immich.yaml
- Delete argocd/manifests/immich/ entirely
- Delete argocd/manifests/databases/{immich-pg,external-secret-immich-borgmatic,service-immich-pg-tailscale}.yaml
- Remove those entries from databases/kustomization.yaml

Add ringtail-side immich-pg Tailscale LoadBalancer Service (hostname
"immich-pg") so borgmatic can keep using the same FQDN for nightly
backups. This claims the device name freed by deleting the minikube
service.

The ringtail manifest path stays as argocd/manifests/immich-ringtail/
and the ArgoCD app stays as immich-ringtail — renaming would force a
cascading delete + recreate, with a window where live resources
disappear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:31:09 -07:00
aad76fc3e0 C2(migrate-immich-to-ringtail): impl rename ringtail immich ingress photos-ringtail -> photos
Minikube immich-tailscale Ingress was deleted; the "photos" Tailscale
device name is now free. Renaming the ringtail ingress claims it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:27:04 -07:00
18e6c7ef5d C2(migrate-immich-to-ringtail): close immich-app-on-ringtail
All three pods Running, 1/1 Ready:
- immich-server: v2.6.3, connected to ringtail pg + valkey
  ("/api/server/ping" returns 200, "/api/server/version" returns
  v2.6.3)
- immich-machine-learning: CUDA variant, RTX 4080 attached
  (nvidia-smi shows 8 GiB used / 16 GiB total — shared with
  frigate via time-slicing), gunicorn workers booted
- immich-valkey: upstream multi-arch docker.io/valkey/valkey:8.1.6

immich-db Secret in the immich namespace created manually with
source's immich-pg-app password (matches minikube pattern).
Tailscale ingress staging hostname: photos-ringtail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:23:24 -07:00
5a9596c7d9 C2(migrate-immich-to-ringtail): impl add immich Deployments + bump GPU time-slicing
- argocd/manifests/immich-ringtail/: full port of the immich stack
  (server, ML, valkey, services, ingress, pvc-ml-cache) from
  argocd/manifests/immich/, with ringtail-specific tweaks:
  - deployment-ml: runtimeClassName=nvidia, nvidia.com/gpu:1 limit,
    -cuda image tag
  - deployment-valkey + kustomization: drop the
    registry.ops.eblu.me/blumeops/valkey mirror (arm64-only), use
    upstream docker.io/valkey/valkey:8.1.6 (multi-arch)
  - ingress-tailscale: tls.hosts=[photos-ringtail] for staging
- argocd/apps/immich-ringtail.yaml: new ArgoCD app (manual sync,
  ringtail destination)
- argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml:
  bump replicas 2 -> 4 so the ringtail GPU can be shared by
  frigate + ollama + immich-ml

The immich-db Secret in the immich namespace is created manually
(matching minikube pattern) — see argocd/apps/immich-ringtail.yaml
header for the procedure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:14:07 -07:00
674ca2ced9 C2(migrate-immich-to-ringtail): close immich-pg-data-migration
Migration via CNPG pg_basebackup (Option A) completed cleanly.

Sequence:
1. Stopped immich-server + immich-machine-learning on minikube
   (scaled to 0). valkey + source pg kept running.
2. Copied minikube's immich-pg-ca + immich-pg-replication secrets
   to ringtail as source-immich-pg-{ca,replication}.
3. Recreated the ringtail immich-pg Cluster with
   bootstrap.pg_basebackup, replica.enabled=true, externalClusters
   pointing at immich-pg.tail8d86e.ts.net via the streaming_replica
   TLS cert.
4. Basebackup completed in ~50s. Replica caught up streaming.
5. Verified row counts identical between source and replica:
   asset=12681, user=1, album=28, smart_search=9624,
   activity=0, asset_face=3917.
6. Promoted via replica.enabled=false. pg_is_in_recovery → false.
   Write test passed. All 7 expected extensions present in immich
   db (vector, vchord, cube, earthdistance, pg_trgm, unaccent,
   uuid-ossp).
7. Pruned bootstrap + externalClusters blocks; deleted out-of-band
   replication secrets.

Source minikube immich-pg is intact and untouched — recovery path
remains available until immich-cutover-and-decommission completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
e59bbc9348 C2(migrate-immich-to-ringtail): impl prune externalClusters + bootstrap from immich-pg manifest
Migration done, cluster promoted. Pruning the externalClusters block
and bootstrap.pg_basebackup reference eliminates the footgun where a
future replica.enabled=true would demote this primary against the
stale minikube source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
431d538ab1 C2(migrate-immich-to-ringtail): impl promote ringtail immich-pg from replica to primary
Row counts verified equal between source (minikube) and replica
(ringtail) across asset (12681), user (1), album (28),
smart_search (9624), activity (0), asset_face (3917). Source immich
is scaled to 0 — no writes since the basebackup completed.

Flipping replica.enabled=false to promote. The externalClusters and
bootstrap.pg_basebackup blocks are left in place as documentation
(CNPG ignores them after initialization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
5752f00343 C2(migrate-immich-to-ringtail): impl bootstrap immich-pg via pg_basebackup from minikube
Replaces the initdb bootstrap with a pg_basebackup from the minikube
source over the tailnet (immich-pg.tail8d86e.ts.net). The ringtail
cluster starts in replica mode (replica.enabled=true), streaming WAL
from the source. Promotion happens by flipping replica.enabled=false
after the replica catches up and the source is quiesced.

Uses the source's streaming_replica TLS cert + CA, copied to ringtail
as out-of-band secrets (source-immich-pg-replication,
source-immich-pg-ca) — the standard CNPG-to-CNPG migration auth path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
be5255d685 C2(migrate-immich-to-ringtail): close sifaka-nfs-from-ringtail
Verified on k3s-ringtail:
- Sifaka NFS export /volume1/photos covers 192.168.1.0/24 +
  100.64.0.0/10. Ringtail at 192.168.1.21 is in scope; no DSM rule
  changes needed.
- nfs-test pod mounted the share, read existing library/ thumbs/
  backups/ encoded-video/ profile/, wrote a temp file, deleted it.
- DNS resolution: sifaka → 192.168.1.203 (LAN). NFS traffic stays
  off tailnet, avoiding the sifaka-tailscale-userspace concern.
- Committed PV + PVC bind on first apply (RWX, 2Ti).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
9f8d627ce8 C2(migrate-immich-to-ringtail): impl add ringtail-side NFS PV/PVC for immich library
Mirrors argocd/manifests/immich/pv-nfs.yaml + pvc.yaml. PV renamed
to immich-library-nfs-pv-ringtail to avoid confusion with the
minikube side (PVs are cluster-scoped; both can coexist).

Initial kustomization.yaml in argocd/manifests/immich-ringtail/
holds just the storage bits today; deployments/services/ingress
will be added in immich-app-on-ringtail.

Verified: PVC binds to PV on k3s-ringtail; mount test from a
busybox pod read existing photo library dirs, wrote and deleted a
test file. DNS resolves sifaka to 192.168.1.203 so NFS traffic
stays on the LAN, off the tailnet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
4c6695868d C2(migrate-immich-to-ringtail): close immich-pg-on-ringtail
Verified on k3s-ringtail:
- Cluster immich-pg reached "Cluster in healthy state" (1/1 instance)
- borgmatic role: rolcanlogin=t, member of pg_read_all_data
- ExternalSecret immich-pg-borgmatic: Ready=True, username=borgmatic
- Extensions vchord, vector, cube, earthdistance installed in postgres db
  (immich db extensions deferred to app startup per the card)

10 GiB local-path storage; same VectorChord image as minikube source.
Bootstrap is empty initdb today; will be rewritten when
immich-pg-data-migration picks its import method.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
1d9d8867fb C2(migrate-immich-to-ringtail): impl add immich-pg cluster + app on ringtail
Mirror of argocd/manifests/databases/immich-pg.yaml on ringtail:
- Same VectorChord image (PG17 + VectorChord 0.5.0)
- Same extensions (vector, vchord, cube, earthdistance) via postInitSQL
- Same managed borgmatic role with pg_read_all_data
- 10 GiB local-path storage (matches minikube source)
- shared_preload_libraries: vchord.so
- Empty initdb today; bootstrap block will be rewritten when
  immich-pg-data-migration picks its import method.

ArgoCD app databases-ringtail targets ringtail/databases.
ExternalSecret reuses the onepassword-blumeops ClusterSecretStore that
already exists on ringtail via external-secrets-ringtail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
e1fe5d2ea6 C2(migrate-immich-to-ringtail): close cnpg-on-ringtail
Verified: cnpg-controller-manager pod Ready on k3s-ringtail; CRDs
clusters.postgresql.cnpg.io etc. installed; ArgoCD app Synced/Healthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
b37ac0750f C2(migrate-immich-to-ringtail): impl add cloudnative-pg-ringtail ArgoCD app
Sibling of cloudnative-pg.yaml targeting k3s-ringtail. Same mirror
(mirrors/cloudnative-pg) and release (v1.27.1), same sync options.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:21 -07:00
bca5c40663 C2(migrate-immich-to-ringtail): plan capture GPU contention + valkey arch on immich-app-on-ringtail
Two discovered prereqs while bringing the immich stack up on ringtail:

1. nvidia-device-plugin time-slicing on ringtail advertises only 2
   virtual GPUs. Frigate + Ollama consume both. immich-ml's
   nvidia.com/gpu:1 cannot schedule until replicas is bumped to >= 3.
2. The registry.ops.eblu.me/blumeops/valkey image was built on indri
   (arm64) and is single-arch. Pulling on ringtail (amd64)
   crashloops with "exec format error". Use the upstream multi-arch
   docker.io/valkey/valkey image directly until the mirror gets a
   multi-arch tag.

Card body updated to capture both. Next impl incorporates the fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:12:09 -07:00
355be3fbc4 C2(migrate-immich-to-ringtail): plan correct extension-verification on immich-pg-on-ringtail card
CNPG's bootstrap.initdb.postInitSQL runs against the postgres
superuser database, not the application database. Extensions
declared there end up in the postgres db, not the immich db. The
Immich app installs them in its own database at startup.

This matches the existing minikube cluster's behavior — same
Cluster CR, same effect. Adjusting the card's verification to
reflect reality rather than (incorrectly) requiring extensions to
be present in the immich db pre-app-deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:25:30 -07:00
db37e7cc3e C2(migrate-immich-to-ringtail): plan capture two discovered concerns
1. Registering new ArgoCD apps from a feature branch: the app-of-apps
   "apps" Application is self-managing (re-reads apps.yaml on every
   sync, which pins targetRevision: main). So setting its revision to
   a branch doesn't stick across syncs, and new app definitions on a
   branch are invisible to the cluster via the normal flow. The goal
   card now documents the kubectl-apply + per-new-app `argocd app set
   --revision <branch>` workaround.

2. Tailscale device-name collision on cutover. The minikube immich
   ingress claims tailnet hostname "photos" (tls.hosts: [photos]).
   The ringtail ingress can't claim the same name while minikube's is
   alive (Tailscale enforces uniqueness). Staging uses
   tls.hosts: [photos-ringtail], with the rename to "photos" baked
   into immich-cutover-and-decommission step 2 + step 5.

Card dependency graph unchanged; no new cards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:21:57 -07:00
4623733695 C2(migrate-immich-to-ringtail): plan introduce mikado chain
Goal: move immich (server, ML, valkey, postgres) off minikube-indri
onto k3s-ringtail. Immich is the largest single tenant on minikube
(~1.5 GiB resident) and minikube is memory-saturated.

Prerequisite cards:
- cnpg-on-ringtail
- immich-pg-on-ringtail (requires cnpg-on-ringtail)
- immich-pg-data-migration (requires immich-pg-on-ringtail)
- sifaka-nfs-from-ringtail
- immich-app-on-ringtail (requires immich-pg-on-ringtail, sifaka-nfs-from-ringtail)
- immich-cutover-and-decommission (requires immich-pg-data-migration, immich-app-on-ringtail)

Data loss is a critical failure; downtime is acceptable. The cutover
plan favors a CNPG externalCluster basebackup (Option A) with pg_dump
as the documented fallback (Option B).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:05:40 -07:00