From 947e4310c306c36e1096f98f5431cf910554d823 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Wed, 13 May 2026 16:46:17 -0700 Subject: [PATCH] C2: migrate immich from minikube to ringtail (mikado chain) (#356) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary C2 Mikado chain to move the entire Immich stack (server, ML, valkey, postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the largest single tenant on minikube (~1.5 GiB resident) and minikube is currently memory-saturated (97% RAM, swapping). This is the first concrete chain in the broader indri-k8s decommission effort. This PR contains the planning layer only — 7 cards (1 goal + 6 prerequisites). Implementation cycles follow per the Mikado Branch Invariant. ## Goal end-state - Immich `server`, `machine-learning`, `valkey` on ringtail. - ML pod uses ringtail's RTX 4080 (performance win — currently CPU-only). - CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail. - Library still on sifaka NFS — ringtail mounts the same path. - `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress. - Minikube `immich` and `immich-pg` are removed. ## Cards | Card | Depends on | |---|---| | `migrate-immich-to-ringtail` (goal) | all six below | | `cnpg-on-ringtail` | — | | `immich-pg-on-ringtail` | cnpg-on-ringtail | | `immich-pg-data-migration` | immich-pg-on-ringtail | | `sifaka-nfs-from-ringtail` | — | | `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail | | `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail | ## Key constraints - **No data loss.** Downtime is acceptable; data loss is not. Two surfaces matter: postgres (ML embeddings, face data — slow to re-derive) and the library files (don't move, but NFS access from ringtail must be verified). - **Migration method:** Option A is a CNPG `externalCluster` basebackup → promote. Option B is `pg_dump`/`pg_restore` as a documented fallback. Either way, dry-run against a scratch cluster first. - **Why pg moves too** (not cross-cluster): keeping pg on minikube would block the whole decommission, and Immich is chatty with pg so tailnet round-trips would hurt. ## Test plan - [ ] Plan review — does the dependency graph make sense? - [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the chain correctly. - [ ] Per-card implementation cycles land separately (commit convention enforced by hook). Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/356 --- argocd/apps/cloudnative-pg-ringtail.yaml | 27 ++++ argocd/apps/databases-ringtail.yaml | 26 ++++ argocd/apps/immich-ringtail.yaml | 31 ++++ argocd/apps/immich.yaml | 30 ---- .../external-secret-immich-borgmatic.yaml | 15 +- .../databases-ringtail/immich-pg.yaml | 53 +++++++ .../databases-ringtail/kustomization.yaml | 9 ++ .../service-immich-pg-tailscale.yaml | 8 +- argocd/manifests/databases/immich-pg.yaml | 69 --------- argocd/manifests/databases/kustomization.yaml | 3 - .../deployment-ml.yaml | 6 + .../deployment-server.yaml | 0 .../deployment-valkey.yaml | 0 .../ingress-tailscale.yaml | 15 +- .../kustomization.yaml | 13 +- argocd/manifests/immich-ringtail/pv-nfs.yaml | 29 ++++ .../pvc-ml-cache.yaml | 0 .../{immich => immich-ringtail}/pvc.yaml | 6 +- .../service-ml.yaml | 0 .../service-valkey.yaml | 0 .../{immich => immich-ringtail}/service.yaml | 0 argocd/manifests/immich/README.md | 115 --------------- argocd/manifests/immich/pv-nfs.yaml | 22 --- .../time-slicing-config.yaml | 2 +- .../migrate-immich-to-ringtail.infra.md | 13 ++ docs/how-to/immich/cnpg-on-ringtail.md | 52 +++++++ docs/how-to/immich/immich-app-on-ringtail.md | 91 ++++++++++++ .../immich/immich-cutover-and-decommission.md | 103 ++++++++++++++ .../how-to/immich/immich-pg-data-migration.md | 79 +++++++++++ docs/how-to/immich/immich-pg-on-ringtail.md | 69 +++++++++ .../immich/migrate-immich-to-ringtail.md | 132 ++++++++++++++++++ .../how-to/immich/sifaka-nfs-from-ringtail.md | 67 +++++++++ 32 files changed, 820 insertions(+), 265 deletions(-) create mode 100644 argocd/apps/cloudnative-pg-ringtail.yaml create mode 100644 argocd/apps/databases-ringtail.yaml create mode 100644 argocd/apps/immich-ringtail.yaml delete mode 100644 argocd/apps/immich.yaml rename argocd/manifests/{databases => databases-ringtail}/external-secret-immich-borgmatic.yaml (65%) create mode 100644 argocd/manifests/databases-ringtail/immich-pg.yaml create mode 100644 argocd/manifests/databases-ringtail/kustomization.yaml rename argocd/manifests/{databases => databases-ringtail}/service-immich-pg-tailscale.yaml (57%) delete mode 100644 argocd/manifests/databases/immich-pg.yaml rename argocd/manifests/{immich => immich-ringtail}/deployment-ml.yaml (83%) rename argocd/manifests/{immich => immich-ringtail}/deployment-server.yaml (100%) rename argocd/manifests/{immich => immich-ringtail}/deployment-valkey.yaml (100%) rename argocd/manifests/{immich => immich-ringtail}/ingress-tailscale.yaml (62%) rename argocd/manifests/{immich => immich-ringtail}/kustomization.yaml (61%) create mode 100644 argocd/manifests/immich-ringtail/pv-nfs.yaml rename argocd/manifests/{immich => immich-ringtail}/pvc-ml-cache.yaml (100%) rename argocd/manifests/{immich => immich-ringtail}/pvc.yaml (54%) rename argocd/manifests/{immich => immich-ringtail}/service-ml.yaml (100%) rename argocd/manifests/{immich => immich-ringtail}/service-valkey.yaml (100%) rename argocd/manifests/{immich => immich-ringtail}/service.yaml (100%) delete mode 100644 argocd/manifests/immich/README.md delete mode 100644 argocd/manifests/immich/pv-nfs.yaml create mode 100644 docs/changelog.d/migrate-immich-to-ringtail.infra.md create mode 100644 docs/how-to/immich/cnpg-on-ringtail.md create mode 100644 docs/how-to/immich/immich-app-on-ringtail.md create mode 100644 docs/how-to/immich/immich-cutover-and-decommission.md create mode 100644 docs/how-to/immich/immich-pg-data-migration.md create mode 100644 docs/how-to/immich/immich-pg-on-ringtail.md create mode 100644 docs/how-to/immich/migrate-immich-to-ringtail.md create mode 100644 docs/how-to/immich/sifaka-nfs-from-ringtail.md diff --git a/argocd/apps/cloudnative-pg-ringtail.yaml b/argocd/apps/cloudnative-pg-ringtail.yaml new file mode 100644 index 0000000..fa7bba0 --- /dev/null +++ b/argocd/apps/cloudnative-pg-ringtail.yaml @@ -0,0 +1,27 @@ +# CloudNativePG Operator for ringtail k3s cluster +# Deploys the operator only; PostgreSQL clusters are created separately +# +# Sibling of cloudnative-pg.yaml (minikube). Same mirror, same release, +# different destination. Both apps will coexist during the immich +# migration; the minikube one is removed at the end of the broader +# indri-k8s decommission. +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: cloudnative-pg-ringtail + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git + targetRevision: v1.27.1 + path: releases + directory: + include: 'cnpg-1.27.1.yaml' + destination: + server: https://ringtail.tail8d86e.ts.net:6443 + namespace: cnpg-system + syncPolicy: + syncOptions: + - CreateNamespace=true + - ServerSideApply=true # Required for large CRDs that exceed annotation size limit diff --git a/argocd/apps/databases-ringtail.yaml b/argocd/apps/databases-ringtail.yaml new file mode 100644 index 0000000..00de4e3 --- /dev/null +++ b/argocd/apps/databases-ringtail.yaml @@ -0,0 +1,26 @@ +# Databases on ringtail k3s. +# +# Today: only immich-pg (CNPG Cluster) + its borgmatic ExternalSecret. +# More databases may move here as the indri-k8s decommission proceeds. +# +# Prerequisites: +# - cloudnative-pg-ringtail (operator must exist before the Cluster CR) +# - external-secrets-ringtail + 1password-connect-ringtail (for the +# immich-pg-borgmatic ExternalSecret to sync) +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: databases-ringtail + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/databases-ringtail + destination: + server: https://ringtail.tail8d86e.ts.net:6443 + namespace: databases + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/apps/immich-ringtail.yaml b/argocd/apps/immich-ringtail.yaml new file mode 100644 index 0000000..c93cbee --- /dev/null +++ b/argocd/apps/immich-ringtail.yaml @@ -0,0 +1,31 @@ +# Immich on ringtail k3s. +# +# Staging deployment; the minikube `immich` app remains in parallel +# until cutover. See [[immich-cutover-and-decommission]] for the +# routing flip + minikube cleanup. +# +# Prerequisites: +# - cnpg-on-ringtail + databases-ringtail (postgres) +# - 1password-connect-ringtail + external-secrets-ringtail (not used +# by this app today — immich-db Secret is created manually, +# matching the minikube pattern) +# - The immich-db Secret in the immich namespace, holding the +# password for the `immich` postgres role (copied from the source +# immich-pg-app Secret at migration time). +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: immich-ringtail + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/immich-ringtail + destination: + server: https://ringtail.tail8d86e.ts.net:6443 + namespace: immich + syncPolicy: + syncOptions: + - CreateNamespace=true diff --git a/argocd/apps/immich.yaml b/argocd/apps/immich.yaml deleted file mode 100644 index 7efd263..0000000 --- a/argocd/apps/immich.yaml +++ /dev/null @@ -1,30 +0,0 @@ -# Immich - Self-hosted photo and video management -# High-performance Google Photos/iCloud alternative with AI features -# -# Kustomize manifests in argocd/manifests/immich/ -# Components: server, machine-learning, valkey (Redis) -# -# Prerequisites: -# 1. Create immich namespace and secrets: -# kubectl create namespace immich -# kubectl --context=minikube-indri create secret generic immich-db -n immich \ -# --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)" -# 2. Create immich-pg database and user (see immich-pg app) -# 3. NFS share on sifaka at /volume1/photos with read/write for indri -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: immich - namespace: argocd -spec: - project: default - source: - repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git - targetRevision: main - path: argocd/manifests/immich - destination: - server: https://kubernetes.default.svc - namespace: immich - syncPolicy: - syncOptions: - - CreateNamespace=true diff --git a/argocd/manifests/databases/external-secret-immich-borgmatic.yaml b/argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml similarity index 65% rename from argocd/manifests/databases/external-secret-immich-borgmatic.yaml rename to argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml index 8801c1a..3d1fc14 100644 --- a/argocd/manifests/databases/external-secret-immich-borgmatic.yaml +++ b/argocd/manifests/databases-ringtail/external-secret-immich-borgmatic.yaml @@ -1,9 +1,12 @@ # ExternalSecret for borgmatic backup user password on immich-pg cluster +# (ringtail k3s). +# +# Mirror of argocd/manifests/databases/external-secret-immich-borgmatic.yaml. +# The onepassword-blumeops ClusterSecretStore exists on ringtail via the +# external-secrets-ringtail app. # -# Reuses the same 1Password item as blumeops-pg-borgmatic. # 1Password item: "borgmatic" in blumeops vault # Field: "db-password" -# apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: @@ -23,7 +26,7 @@ spec: username: borgmatic password: "{{ .password }}" data: - - secretKey: password - remoteRef: - key: borgmatic - property: db-password + - secretKey: password + remoteRef: + key: borgmatic + property: db-password diff --git a/argocd/manifests/databases-ringtail/immich-pg.yaml b/argocd/manifests/databases-ringtail/immich-pg.yaml new file mode 100644 index 0000000..982bc43 --- /dev/null +++ b/argocd/manifests/databases-ringtail/immich-pg.yaml @@ -0,0 +1,53 @@ +# PostgreSQL Cluster for Immich on ringtail k3s. +# +# Initially bootstrapped via CNPG pg_basebackup from the minikube +# immich-pg cluster on 2026-05-13, then promoted to primary. The +# externalClusters + bootstrap.pg_basebackup blocks have been pruned +# from this manifest now that the migration is complete — leaving +# them around is a footgun (re-enabling replica.enabled=true would +# try to demote this cluster against a stale source). See +# [[immich-pg-data-migration]] for the procedure used. +apiVersion: postgresql.cnpg.io/v1 +kind: Cluster +metadata: + name: immich-pg + namespace: databases +spec: + instances: 1 + imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0 + + storage: + size: 10Gi + storageClass: local-path + + # Managed roles + managed: + roles: + - name: borgmatic + login: true + connectionLimit: -1 + ensure: present + inherit: true + inRoles: + - pg_read_all_data + passwordSecret: + name: immich-pg-borgmatic + + resources: + requests: + memory: "256Mi" + cpu: "100m" + limits: + memory: "1Gi" + cpu: "500m" + + postgresql: + shared_preload_libraries: + - "vchord.so" + parameters: + max_connections: "50" + shared_buffers: "128MB" + password_encryption: "scram-sha-256" + pg_hba: + - host all all 0.0.0.0/0 scram-sha-256 + - host all all ::/0 scram-sha-256 diff --git a/argocd/manifests/databases-ringtail/kustomization.yaml b/argocd/manifests/databases-ringtail/kustomization.yaml new file mode 100644 index 0000000..971e2d4 --- /dev/null +++ b/argocd/manifests/databases-ringtail/kustomization.yaml @@ -0,0 +1,9 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +namespace: databases + +resources: + - immich-pg.yaml + - external-secret-immich-borgmatic.yaml + - service-immich-pg-tailscale.yaml diff --git a/argocd/manifests/databases/service-immich-pg-tailscale.yaml b/argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml similarity index 57% rename from argocd/manifests/databases/service-immich-pg-tailscale.yaml rename to argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml index 78891dd..92deb14 100644 --- a/argocd/manifests/databases/service-immich-pg-tailscale.yaml +++ b/argocd/manifests/databases-ringtail/service-immich-pg-tailscale.yaml @@ -1,6 +1,8 @@ -# Tailscale LoadBalancer for immich-pg PostgreSQL access -# Canonical hostname: immich-pg.tail8d86e.ts.net -# Caddy L4 proxies pg.ops.eblu.me:5433 → this service for borgmatic backups +# Tailscale LoadBalancer for immich-pg PostgreSQL access on ringtail. +# Canonical hostname: immich-pg.tail8d86e.ts.net (claimed from the +# minikube side after the minikube service was removed during the +# immich-to-ringtail migration). Borgmatic on indri uses this +# hostname for nightly backups. apiVersion: v1 kind: Service metadata: diff --git a/argocd/manifests/databases/immich-pg.yaml b/argocd/manifests/databases/immich-pg.yaml deleted file mode 100644 index 74c6f4e..0000000 --- a/argocd/manifests/databases/immich-pg.yaml +++ /dev/null @@ -1,69 +0,0 @@ -# PostgreSQL Cluster for Immich -# Uses VectorChord (successor to pgvecto.rs) for AI-powered vector search -# See: https://github.com/immich-app/immich/discussions/9060 -# Managed by CloudNativePG operator -apiVersion: postgresql.cnpg.io/v1 -kind: Cluster -metadata: - name: immich-pg - namespace: databases -spec: - instances: 1 - # VectorChord image for PostgreSQL 17 with VectorChord 0.5.0 - # Immich v2.4.1 requires VectorChord >=0.3 <0.6 - # See: https://github.com/tensorchord/VectorChord - imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0 - - storage: - size: 10Gi - storageClass: standard - - # Bootstrap creates initial database and owner - bootstrap: - initdb: - database: immich - owner: immich - postInitSQL: - # Extensions required by Immich - - CREATE EXTENSION IF NOT EXISTS vector; - - CREATE EXTENSION IF NOT EXISTS vchord CASCADE; - - CREATE EXTENSION IF NOT EXISTS cube CASCADE; - - CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE; - - # Managed roles - # Note: connectionLimit, ensure, inherit are CNPG defaults added to prevent ArgoCD drift - managed: - roles: - # borgmatic read-only user for backups - - name: borgmatic - login: true - connectionLimit: -1 - ensure: present - inherit: true - inRoles: - - pg_read_all_data - passwordSecret: - name: immich-pg-borgmatic - - # Resource limits for minikube environment - resources: - requests: - memory: "256Mi" - cpu: "100m" - limits: - memory: "1Gi" - cpu: "500m" - - # PostgreSQL configuration - postgresql: - # VectorChord requires vchord.so in shared_preload_libraries - shared_preload_libraries: - - "vchord.so" - parameters: - max_connections: "50" - shared_buffers: "128MB" - password_encryption: "scram-sha-256" - pg_hba: - # Allow connections from k8s pods - - host all all 0.0.0.0/0 scram-sha-256 - - host all all ::/0 scram-sha-256 diff --git a/argocd/manifests/databases/kustomization.yaml b/argocd/manifests/databases/kustomization.yaml index b25e09e..692285a 100644 --- a/argocd/manifests/databases/kustomization.yaml +++ b/argocd/manifests/databases/kustomization.yaml @@ -5,13 +5,10 @@ namespace: databases resources: - blumeops-pg.yaml - - immich-pg.yaml - service-tailscale.yaml - - service-immich-pg-tailscale.yaml - service-metrics-tailscale.yaml - external-secret-eblume.yaml - external-secret-borgmatic.yaml - - external-secret-immich-borgmatic.yaml - external-secret-teslamate.yaml - external-secret-authentik.yaml - external-secret-paperless.yaml diff --git a/argocd/manifests/immich/deployment-ml.yaml b/argocd/manifests/immich-ringtail/deployment-ml.yaml similarity index 83% rename from argocd/manifests/immich/deployment-ml.yaml rename to argocd/manifests/immich-ringtail/deployment-ml.yaml index 57c4242..5ea8035 100644 --- a/argocd/manifests/immich/deployment-ml.yaml +++ b/argocd/manifests/immich-ringtail/deployment-ml.yaml @@ -16,11 +16,16 @@ spec: app: immich component: machine-learning spec: + runtimeClassName: nvidia securityContext: seccompProfile: type: RuntimeDefault containers: - name: machine-learning + # ringtail uses the -cuda tag (set in kustomization.yaml) + # to take advantage of the RTX 4080 via the nvidia + # device plugin. Time-slicing is configured for 4 replicas + # so frigate + ollama + this pod can share. image: ghcr.io/immich-app/immich-machine-learning:kustomized ports: - name: http @@ -57,6 +62,7 @@ spec: cpu: "100m" limits: memory: "4Gi" + nvidia.com/gpu: "1" volumes: - name: cache persistentVolumeClaim: diff --git a/argocd/manifests/immich/deployment-server.yaml b/argocd/manifests/immich-ringtail/deployment-server.yaml similarity index 100% rename from argocd/manifests/immich/deployment-server.yaml rename to argocd/manifests/immich-ringtail/deployment-server.yaml diff --git a/argocd/manifests/immich/deployment-valkey.yaml b/argocd/manifests/immich-ringtail/deployment-valkey.yaml similarity index 100% rename from argocd/manifests/immich/deployment-valkey.yaml rename to argocd/manifests/immich-ringtail/deployment-valkey.yaml diff --git a/argocd/manifests/immich/ingress-tailscale.yaml b/argocd/manifests/immich-ringtail/ingress-tailscale.yaml similarity index 62% rename from argocd/manifests/immich/ingress-tailscale.yaml rename to argocd/manifests/immich-ringtail/ingress-tailscale.yaml index 59a4c05..f0b5fe1 100644 --- a/argocd/manifests/immich/ingress-tailscale.yaml +++ b/argocd/manifests/immich-ringtail/ingress-tailscale.yaml @@ -1,6 +1,9 @@ -# Tailscale Ingress for Immich -# Exposes Immich at photos.tail8d86e.ts.net -# Caddy will proxy photos.ops.eblu.me to this endpoint +# Tailscale ProxyGroup Ingress for Immich on ringtail. +# +# Production hostname: photos.tail8d86e.ts.net +# (during the cutover window this was photos-ringtail; the minikube +# ingress was torn down before this was renamed to photos to avoid +# the Tailscale device-name collision.) apiVersion: networking.k8s.io/v1 kind: Ingress metadata: @@ -16,12 +19,6 @@ metadata: gethomepage.dev/description: "Photo management" gethomepage.dev/href: "https://photos.ops.eblu.me" gethomepage.dev/pod-selector: "app=immich,component=server" - # TODO: Add Immich widget - requires API key from Account Settings > API Keys - # See: https://gethomepage.dev/widgets/services/immich/ - # gethomepage.dev/widget.type: "immich" - # gethomepage.dev/widget.url: "https://photos.ops.eblu.me" - # gethomepage.dev/widget.key: "{{HOMEPAGE_VAR_IMMICH_API_KEY}}" - # gethomepage.dev/widget.version: "2" spec: ingressClassName: tailscale rules: diff --git a/argocd/manifests/immich/kustomization.yaml b/argocd/manifests/immich-ringtail/kustomization.yaml similarity index 61% rename from argocd/manifests/immich/kustomization.yaml rename to argocd/manifests/immich-ringtail/kustomization.yaml index 5f8d02b..c1f639e 100644 --- a/argocd/manifests/immich/kustomization.yaml +++ b/argocd/manifests/immich-ringtail/kustomization.yaml @@ -1,7 +1,8 @@ ---- apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization + namespace: immich + resources: - deployment-server.yaml - deployment-ml.yaml @@ -13,11 +14,15 @@ resources: - pv-nfs.yaml - pvc.yaml - ingress-tailscale.yaml + images: - name: ghcr.io/immich-app/immich-server newTag: v2.6.3 - name: ghcr.io/immich-app/immich-machine-learning - newTag: v2.6.3 + # CUDA variant of the same release — ringtail has an RTX 4080 + newTag: v2.6.3-cuda + # Using upstream multi-arch valkey image directly; the + # registry.ops.eblu.me/blumeops/valkey mirror is arm64-only (built + # on indri) and would crashloop on ringtail. - name: docker.io/valkey/valkey - newName: registry.ops.eblu.me/blumeops/valkey - newTag: v8.1.6-r0-fabca04 + newTag: "8.1.6" diff --git a/argocd/manifests/immich-ringtail/pv-nfs.yaml b/argocd/manifests/immich-ringtail/pv-nfs.yaml new file mode 100644 index 0000000..3d5a682 --- /dev/null +++ b/argocd/manifests/immich-ringtail/pv-nfs.yaml @@ -0,0 +1,29 @@ +# NFS PersistentVolume for Immich photo library on ringtail k3s. +# +# Mirror of argocd/manifests/immich/pv-nfs.yaml (minikube) but with +# a distinct name (minikube and ringtail are separate clusters, so PV +# names don't collide cluster-side, but using the same name in two +# manifests is confusing). +# +# The sifaka NFS export for /volume1/photos already permits +# 192.168.1.0/24 + 100.64.0.0/10. Ringtail's wired IP (192.168.1.21) +# falls in the first CIDR, so no DSM rule changes are needed. +# +# Verified 2026-05-13: ringtail pod can read existing dirs, write +# new files, and delete them. DNS resolves sifaka to 192.168.1.203 +# (LAN), so NFS traffic stays off the tailnet — avoids the known +# sifaka-tailscale-userspace bite. +apiVersion: v1 +kind: PersistentVolume +metadata: + name: immich-library-nfs-pv-ringtail +spec: + capacity: + storage: 2Ti + accessModes: + - ReadWriteMany + persistentVolumeReclaimPolicy: Retain + storageClassName: "" + nfs: + server: sifaka + path: /volume1/photos diff --git a/argocd/manifests/immich/pvc-ml-cache.yaml b/argocd/manifests/immich-ringtail/pvc-ml-cache.yaml similarity index 100% rename from argocd/manifests/immich/pvc-ml-cache.yaml rename to argocd/manifests/immich-ringtail/pvc-ml-cache.yaml diff --git a/argocd/manifests/immich/pvc.yaml b/argocd/manifests/immich-ringtail/pvc.yaml similarity index 54% rename from argocd/manifests/immich/pvc.yaml rename to argocd/manifests/immich-ringtail/pvc.yaml index c764636..5bfc052 100644 --- a/argocd/manifests/immich/pvc.yaml +++ b/argocd/manifests/immich-ringtail/pvc.yaml @@ -1,5 +1,5 @@ -# PersistentVolumeClaim for Immich photo library -# Binds to the NFS PV for sifaka:/volume1/photos +# PersistentVolumeClaim for Immich photo library on ringtail. +# Binds to immich-library-nfs-pv-ringtail (sifaka:/volume1/photos). apiVersion: v1 kind: PersistentVolumeClaim metadata: @@ -9,7 +9,7 @@ spec: accessModes: - ReadWriteMany storageClassName: "" - volumeName: immich-library-nfs-pv + volumeName: immich-library-nfs-pv-ringtail resources: requests: storage: 2Ti diff --git a/argocd/manifests/immich/service-ml.yaml b/argocd/manifests/immich-ringtail/service-ml.yaml similarity index 100% rename from argocd/manifests/immich/service-ml.yaml rename to argocd/manifests/immich-ringtail/service-ml.yaml diff --git a/argocd/manifests/immich/service-valkey.yaml b/argocd/manifests/immich-ringtail/service-valkey.yaml similarity index 100% rename from argocd/manifests/immich/service-valkey.yaml rename to argocd/manifests/immich-ringtail/service-valkey.yaml diff --git a/argocd/manifests/immich/service.yaml b/argocd/manifests/immich-ringtail/service.yaml similarity index 100% rename from argocd/manifests/immich/service.yaml rename to argocd/manifests/immich-ringtail/service.yaml diff --git a/argocd/manifests/immich/README.md b/argocd/manifests/immich/README.md deleted file mode 100644 index a82a856..0000000 --- a/argocd/manifests/immich/README.md +++ /dev/null @@ -1,115 +0,0 @@ -# Immich - -Self-hosted photo and video management solution with AI-powered search and face recognition. - -## Prerequisites - -1. **NFS Share**: Create `/volume1/photos` on sifaka with NFS permissions for indri -2. **PostgreSQL**: The `immich-pg` cluster (with pgvecto.rs) must be healthy -3. **Secrets**: Create the database password secret - -## Deployment Order - -1. Sync `blumeops-pg` (to get CloudNativePG operator if not already running) -2. Wait for `immich-pg` cluster to be healthy -3. Create secrets (see below) -4. Sync `immich` (deploys all resources: storage, services, deployments) -5. Run `mise run provision-indri -- --tags caddy` to update Caddy config - -## Components - -| Component | Deployment | Service | Port | -|-----------|------------|---------|------| -| Server (web/API) | `immich-server` | `immich-server` | 2283 | -| Machine Learning | `immich-machine-learning` | `immich-machine-learning` | 3003 | -| Valkey (Redis) | `immich-valkey` | `immich-valkey` | 6379 | - -## Secret Setup - -The `immich-db` secret contains the database password, which is auto-generated by CloudNativePG -in the `immich-pg-app` secret. To create or regenerate the secret: - -```bash -# Create namespace if needed -kubectl --context=minikube-indri create namespace immich - -# Copy password from CNPG secret to immich namespace -kubectl --context=minikube-indri create secret generic immich-db -n immich \ - --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)" -``` - -Note: This secret is not managed by ExternalSecrets since the source of truth is the CNPG-generated secret. - -## Access - -- **URL**: https://photos.ops.eblu.me (after Caddy is updated) -- **Tailscale**: https://photos.tail8d86e.ts.net (direct) - -## First-Time Setup - -1. Navigate to https://photos.ops.eblu.me -2. Create an admin account -3. Configure external library (optional - for importing existing photos) - -## External Library (iCloud Photos) - -To import existing photos from iCloud sync on indri: - -1. In Immich Admin > External Libraries, create a new library -2. Set the import path to the location where iCloud photos sync -3. Configure scan schedule or trigger manual scan - -## Architecture - -``` -┌─────────────────┐ ┌─────────────────┐ -│ immich-server │────▶│ immich-pg │ -│ (web/api) │ │ (PostgreSQL │ -└────────┬────────┘ │ + pgvecto.rs) │ - │ └─────────────────┘ - │ -┌────────▼────────┐ ┌─────────────────┐ -│ immich-ml │ │ valkey │ -│ (ML inference) │ │ (Redis cache) │ -└─────────────────┘ └─────────────────┘ - │ -┌────────▼────────┐ -│ sifaka NFS │ -│ /volume1/photos│ -└─────────────────┘ -``` - -## Version Management - -Image versions are controlled via `kustomization.yaml`: - -```yaml -images: - - name: ghcr.io/immich-app/immich-server - newTag: v2.6.3 - - name: ghcr.io/immich-app/immich-machine-learning - newTag: v2.6.3 - - name: docker.io/valkey/valkey - newTag: "8.1-alpine" -``` - -To upgrade, update `newTag` values and sync via ArgoCD. - -## Troubleshooting - -```bash -# Check pods -kubectl --context=minikube-indri -n immich get pods - -# Check immich-pg cluster -kubectl --context=minikube-indri -n databases get cluster immich-pg - -# View server logs -kubectl --context=minikube-indri -n immich logs -l app=immich,component=server - -# View ML logs -kubectl --context=minikube-indri -n immich logs -l app=immich,component=machine-learning - -# Check PVC binding -kubectl --context=minikube-indri -n immich get pvc -``` diff --git a/argocd/manifests/immich/pv-nfs.yaml b/argocd/manifests/immich/pv-nfs.yaml deleted file mode 100644 index 0bd6ee2..0000000 --- a/argocd/manifests/immich/pv-nfs.yaml +++ /dev/null @@ -1,22 +0,0 @@ -# NFS PersistentVolume for Immich photo library -# Requires: NFS share on sifaka at /volume1/photos with NFS permissions for indri -# -# To create on Synology: -# 1. Control Panel > Shared Folder > Create -# 2. Name: photos, Location: Volume 1 -# 3. Control Panel > File Services > NFS > NFS Rules -# 4. Add rule for "photos" share: Hostname=indri, Privilege=Read/Write, Squash=No mapping -apiVersion: v1 -kind: PersistentVolume -metadata: - name: immich-library-nfs-pv -spec: - capacity: - storage: 2Ti - accessModes: - - ReadWriteMany - persistentVolumeReclaimPolicy: Retain - storageClassName: "" - nfs: - server: sifaka - path: /volume1/photos diff --git a/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml b/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml index dee2fd7..100e7a9 100644 --- a/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml +++ b/argocd/manifests/nvidia-device-plugin/time-slicing-config.yaml @@ -11,4 +11,4 @@ data: timeSlicing: resources: - name: nvidia.com/gpu - replicas: 2 + replicas: 4 diff --git a/docs/changelog.d/migrate-immich-to-ringtail.infra.md b/docs/changelog.d/migrate-immich-to-ringtail.infra.md new file mode 100644 index 0000000..b47742f --- /dev/null +++ b/docs/changelog.d/migrate-immich-to-ringtail.infra.md @@ -0,0 +1,13 @@ +Move the entire Immich stack — server, machine-learning, valkey, +and the PostgreSQL+VectorChord cluster — off `minikube-indri` and +onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG +`pg_basebackup` (replica catch-up then promote); row counts on +`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face` +verified equal between source and replica before cutover. The ML +pod now uses ringtail's RTX 4080 via the nvidia-device-plugin +(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy +routing at `photos.ops.eblu.me` is unchanged (still +`photos.tail8d86e.ts.net`, the device just lives on ringtail now). +Borgmatic backups continue against the same `immich-pg` tailnet +hostname. First concrete chain in the broader indri-k8s +decommission effort. diff --git a/docs/how-to/immich/cnpg-on-ringtail.md b/docs/how-to/immich/cnpg-on-ringtail.md new file mode 100644 index 0000000..153e674 --- /dev/null +++ b/docs/how-to/immich/cnpg-on-ringtail.md @@ -0,0 +1,52 @@ +--- +title: CNPG Operator on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - postgres + - ringtail +--- + +# CNPG Operator on Ringtail + +Bring up the `cloudnative-pg` operator on `k3s-ringtail`. Today the +operator only exists on `minikube-indri` (see +`argocd/apps/cloudnative-pg.yaml`, destination `kubernetes.default.svc`). + +Prerequisite of [[migrate-immich-to-ringtail]]; consumed by +[[immich-pg-on-ringtail]]. + +## What to do + +- Add a sibling `argocd/apps/cloudnative-pg-ringtail.yaml` pointing + at the same mirror (`mirrors/cloudnative-pg`, tag `v1.27.1`), + destination `https://ringtail.tail8d86e.ts.net:6443`, + namespace `cnpg-system`. +- Mirror the `ServerSideApply=true` and `CreateNamespace=true` sync + options (the CRDs exceed the annotation size limit). +- Sync `apps` then `cloudnative-pg-ringtail`. Verify the operator + pod is running on ringtail. + +## Verification + +```fish +kubectl --context=k3s-ringtail -n cnpg-system get pods +kubectl --context=k3s-ringtail get crd clusters.postgresql.cnpg.io +``` + +## Why a separate app + +Each ArgoCD app targets a single cluster via `destination.server`. +We could parameterize with ApplicationSets, but blumeops' convention +is to duplicate the manifest with a `-ringtail` suffix (see +`alloy-ringtail`, `external-secrets-ringtail`, etc.). Keep the +convention. + +## Out of scope + +- Postgres clusters themselves (`immich-pg`, etc.) — those come from + [[immich-pg-on-ringtail]]. +- Removing the minikube cnpg operator. That happens at the very end + of the indri-k8s decommission, not in this chain. diff --git a/docs/how-to/immich/immich-app-on-ringtail.md b/docs/how-to/immich/immich-app-on-ringtail.md new file mode 100644 index 0000000..51b619d --- /dev/null +++ b/docs/how-to/immich/immich-app-on-ringtail.md @@ -0,0 +1,91 @@ +--- +title: Immich App on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - immich +--- + +# Immich App on Ringtail + +Bring up `immich-server`, `immich-machine-learning`, and +`immich-valkey` on ringtail. This card stands the stack up against +the *new* pg cluster — it does not move user traffic. Cutover lives +in [[immich-cutover-and-decommission]]. + +## What to do + +- New manifest dir `argocd/manifests/immich-ringtail/` (the suffix + matches the `-ringtail` convention used by other apps). Port from + `argocd/manifests/immich/`: + - `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail + pg service. + - `deployment-ml.yaml` — use `runtimeClassName: nvidia` + a + `resources.limits` for `nvidia.com/gpu: 1`. Use the `-cuda` tag + of the immich-ml image (set in kustomization). Ringtail is + single-node, so no node selector needed. See + `argocd/manifests/frigate/` for the existing GPU pod pattern. + + **GPU contention discovery:** ringtail's `nvidia-device-plugin` + is configured with `timeSlicing.replicas: 2`. Frigate + Ollama + already consume both virtual slices. Adding immich-ml requires + bumping the count to >= 3. Edit + `argocd/manifests/nvidia-device-plugin/configmap.yaml` (or + wherever the device-plugin config lives) and re-sync the + `nvidia-device-plugin` ArgoCD app. The plugin pod restarts and + the new advertised count appears as the node's + `nvidia.com/gpu` allocatable. + - `deployment-valkey.yaml` — straight port, BUT use the upstream + multi-arch `docker.io/valkey/valkey:` image — do NOT + use the `registry.ops.eblu.me/blumeops/valkey` rewrite in the + kustomization. That mirror was built on indri (arm64) and is + single-arch; pulling it on ringtail (amd64) gets `exec format + error` in CrashLoopBackOff. The mirror should eventually carry + a multi-arch tag, at which point the rewrite can return. + - `service*.yaml` — straight port. + - `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC). + - `pv-nfs.yaml` + `pvc.yaml` — already covered by + [[sifaka-nfs-from-ringtail]] (may live in this dir or theirs). + - `ingress-tailscale.yaml` — ProxyGroup ingress, **must not** set + an explicit `host:` (or use `host: *`) per the lesson on + ProxyGroup VIP routing. + **Hostname collision warning:** the minikube ingress claims the + Tailscale device name `photos` (`tls.hosts: [photos]`). Two + devices on the tailnet cannot share that name. While the + ringtail deployment is being staged it must use a *different* + `tls.hosts` value (e.g. `photos-ringtail`) so it can coexist + with the running minikube one. The flip to `photos` happens at + cutover time, *after* the minikube ingress has been removed. + See [[immich-cutover-and-decommission#Cutover sequence]]. + - `kustomization.yaml` — same `images:` block (server, ML, valkey). +- New ArgoCD app `argocd/apps/immich-ringtail.yaml` targeting + ringtail, namespace `immich`. **Manual sync only** until the + cutover. +- Existing `argocd/apps/immich.yaml` (minikube) stays untouched + during this card — both apps exist briefly. + +## Bring it up against a copy of the DB + +Use the throwaway/test path from [[immich-pg-data-migration#Dry run +before real cutover]]: point the ringtail immich at the *test* pg +cluster first, verify the pod boots, the web UI loads (via +`kubectl port-forward`), assets list, ML embeddings query. Then +tear it down. + +## Verification + +- All three pods Ready. +- ML pod has a GPU attached: `nvidia-smi` inside the container shows + the 4080. +- `immich-server` connects to pg and valkey (no `ECONNREFUSED` in + logs). +- A `kubectl port-forward` to the server service shows the Immich + web UI. + +## Out of scope + +- Public/tailnet routing flip. Caddy still points at the minikube + Tailscale ingress until [[immich-cutover-and-decommission]]. +- Removing the minikube immich. Same. diff --git a/docs/how-to/immich/immich-cutover-and-decommission.md b/docs/how-to/immich/immich-cutover-and-decommission.md new file mode 100644 index 0000000..b44fddd --- /dev/null +++ b/docs/how-to/immich/immich-cutover-and-decommission.md @@ -0,0 +1,103 @@ +--- +title: Immich Cutover and Decommission +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - immich + - migration +--- + +# Immich Cutover and Decommission + +The user-visible flip. By the time this card opens, the ringtail +stack has been proven against a copy of the data. This card does the +real cutover. + +## Pre-cutover checklist + +- [[immich-pg-data-migration]] dry-run succeeded; method is chosen. +- Ringtail immich stack has been brought up against the test pg, + pods healthy, UI loaded ([[immich-app-on-ringtail#Verification]]). +- Borgmatic just ran successfully (a fresh nightly archive is a + belt-and-suspenders fallback, on top of the live source pg). +- User has been told to stop uploading from the iOS app for the + cutover window. + +## Cutover sequence + +1. **Quiesce source.** `kubectl --context=minikube-indri -n immich + scale deploy/immich-server --replicas=0` and same for ML. Leave + valkey + pg running. Confirm no client traffic on the source pg + via `pg_stat_activity`. +2. **Tear down the minikube Tailscale ingress.** The `photos` + Tailscale device name must be freed before ringtail's ingress can + claim it (Tailscale enforces uniqueness across the tailnet). + `kubectl --context=minikube-indri -n immich delete ingress + immich-tailscale` and wait for the corresponding `tailscale`-LB + StatefulSet pod to terminate. Verify the `photos` device is gone: + `tailscale status | grep -i photos` from any tailnet host. +3. **Final sync.** Per chosen method in + [[immich-pg-data-migration]]: + - Option A: promote the ringtail replica. + - Option B: take final `pg_dump`, restore to ringtail + `immich-pg`. +4. **Verify.** Run the row-count and schema-diff checks from + [[immich-pg-data-migration#Verification on the real run]]. +5. **Flip the ringtail ingress to `photos`.** Update + `argocd/manifests/immich-ringtail/ingress-tailscale.yaml`: + `tls.hosts: [photos]` (was `[photos-ringtail]` during staging per + [[immich-app-on-ringtail]]). Commit, `argocd app sync + immich-ringtail`. Wait for the `photos` device to register on the + tailnet again. +6. **Bring up ringtail immich** against the now-promoted pg + (`argocd app sync immich-ringtail`). Wait for Ready. +7. **Flip routing.** Update Caddy on indri + (`ansible/roles/caddy/defaults/main.yml`): `photos.ops.eblu.me` + upstream changes to the ringtail Tailscale ingress hostname + (`photos` — same MagicDNS name, now pointing to the ringtail + proxy). `mise run provision-indri -- --tags caddy`. +8. **Smoke test.** Open `photos.ops.eblu.me` in a browser. Sign in. + Scroll the timeline. Open an album. Trigger an ML search. +9. **Update borgmatic.** If the Tailscale hostname for pg changed, + update `borgmatic.cfg` on indri to point at the ringtail + `immich-pg-tailscale` service. Run a manual backup to verify. + +## After cutover + +- `argocd app set immich --revision ` is no longer relevant; + the minikube `immich` app gets deleted entirely. +- Delete `argocd/apps/immich.yaml`, `argocd/manifests/immich/`, and + the minikube `argocd/manifests/databases/immich-pg.yaml` + + `external-secret-immich-borgmatic.yaml` + + `service-immich-pg-tailscale.yaml`. +- Rename `immich-ringtail` back to `immich` (the `-ringtail` suffix + was scaffolding for the dual-cluster window; once minikube is + empty of immich, the unsuffixed name is clean). +- Confirm the minikube `immich-pg` PVC is no longer used, then + delete it (the PV with `Retain` policy will persist — clean that + up too). + +## Verification (definition of done) + +- `photos.ops.eblu.me` works for a real session, including ML search. +- Source minikube has no `immich` pods, no `immich-pg`, no PVCs. +- Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check + `docker stats minikube` on indri. +- Nightly borgmatic run after the cutover completes successfully, + with the immich-pg archive showing the new source. + +## Rollback (within the cutover window) + +If smoke test fails: flip Caddy back, scale ringtail immich to 0, +scale source immich back up. Source pg was never destroyed. File a +plan reset on the relevant prerequisite card and try again next +session. + +## Out of scope + +- Decommissioning all of minikube. This chain just removes immich. + Other tenants migrate in their own chains as part of the broader + indri-k8s decommission. See [[migrate-immich-to-ringtail]] for + context. diff --git a/docs/how-to/immich/immich-pg-data-migration.md b/docs/how-to/immich/immich-pg-data-migration.md new file mode 100644 index 0000000..fb87783 --- /dev/null +++ b/docs/how-to/immich/immich-pg-data-migration.md @@ -0,0 +1,79 @@ +--- +title: Immich Postgres Data Migration +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - postgres + - immich + - critical +--- + +# Immich Postgres Data Migration + +**This is the data-loss surface of the migration.** Pick a method, +prove it on a throwaway copy first, then run the real cutover. + +## Decision: pick one + +### Option A — CNPG `externalCluster` bootstrap (preferred) + +Stand the ringtail cluster up as a streaming replica of the minikube +cluster via `bootstrap.pg_basebackup.source`. Replica catches up +online; when ready, promote it and point Immich at it. This is +CNPG's documented PG-to-PG migration path and gives near-zero data +loss (the WAL position at promote == the position at app stop). + +Requires: network path from ringtail to minikube's pg over the +tailnet (the existing `immich-pg-tailscale` Service works), and a +superuser secret minikube-side exposed to ringtail's basebackup. + +Pitfall to plan around: the ringtail Cluster CR will need its +`bootstrap` block rewritten *after* promotion (CNPG doesn't +gracefully drop the externalCluster reference). Account for this in +[[immich-pg-on-ringtail]] — it may force a reset of that card. + +### Option B — pg_dump / pg_restore + +Stop immich, `pg_dump -Fc` from minikube, scp to ringtail, restore. +Simpler but full downtime for the whole dump+restore window +(measure on a copy first — VectorChord indexes are slow to rebuild). +Smaller blast radius; no streaming-replication moving parts. + +Use this if Option A hits any blocker. Data loss should still be +zero if the source is stopped first. + +### Option C — leave pg on minikube + +Rejected. See goal card [[migrate-immich-to-ringtail#Why postgres on +ringtail (not cross-cluster)]]. + +## Dry run before real cutover + +Whichever option wins: + +1. Snapshot the minikube `immich-pg` PVC or take a fresh `pg_dump` + into a scratch location. +2. Restore into a *separate* ringtail CNPG cluster (different name, + e.g. `immich-pg-test`) and point a scratch immich-server pod at + it. +3. Verify: pod boots, can list assets, ML embeddings query without + error, face thumbnails render. VectorChord-backed queries should + not error. +4. Tear the scratch cluster down before doing the real one. + +## Verification on the real run + +- Row counts match for `assets`, `albums`, `users`, `face`, + `asset_face`, `smart_search` (the embedding table) — script this. +- `pg_dump --schema-only --no-owner` diff between source and dest + should be empty modulo CNPG-managed roles. +- Immich `/api/server-info/version` and `/api/server-info/statistics` + return sane numbers. + +## Rollback + +If the cutover fails verification: stop the ringtail immich, repoint +ArgoCD `immich.destination` back to minikube, re-sync. Source pg was +never deleted. Document what failed and reset the chain. diff --git a/docs/how-to/immich/immich-pg-on-ringtail.md b/docs/how-to/immich/immich-pg-on-ringtail.md new file mode 100644 index 0000000..10c7072 --- /dev/null +++ b/docs/how-to/immich/immich-pg-on-ringtail.md @@ -0,0 +1,69 @@ +--- +title: Immich Postgres Cluster on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - postgres + - immich +--- + +# Immich Postgres Cluster on Ringtail + +Stand up a fresh `immich-pg` CNPG Cluster on ringtail, ready to receive +data. **No data import yet** — that's [[immich-pg-data-migration]]. + +## What to do + +- Create `argocd/manifests/databases-ringtail/` (or pick another + namespace name — verify what other ringtail pg clusters will use; + if none yet, `databases` is fine). +- Port these from the minikube side: + - `immich-pg.yaml` — CNPG Cluster CR. Same image + (`ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0`), same + extensions, same managed `borgmatic` role. Bump `storage.size` if + the minikube 10 GiB looks tight (check actual usage first). + `storageClass: local-path` on ringtail (default). + - `external-secret-immich-borgmatic.yaml` — same 1Password item, + same field, but referencing the ringtail `ClusterSecretStore` + (`onepassword-blumeops` already exists per the + `external-secrets-ringtail` app). + - Service for in-cluster access (the operator creates `immich-pg-rw` + etc. automatically; verify the app deployment uses those names). + - A Tailscale Service if we want backups to keep working via the + same hostname during the transition — see "Borgmatic" below. +- New ArgoCD app `argocd/apps/databases-ringtail.yaml` pointing at + the new path, destination ringtail. + +## Verification + +- Cluster reaches `Ready`. +- `borgmatic` role exists, `rolcanlogin=t`, and is a member of + `pg_read_all_data` (via `managed.roles[].inRoles`). +- ExternalSecret `immich-pg-borgmatic` syncs from 1Password + (`Ready: True`) and the rendered Secret has `username=borgmatic`. +- The `vchord`, `vector`, `cube`, `earthdistance` extensions show + installed in the `postgres` database (`\dx` from + `psql -U postgres`). They are NOT installed in the `immich` + database at this point — `postInitSQL` in CNPG's `initdb` block + runs against the `postgres` superuser database. The Immich app + itself creates the extensions in its own `immich` database at + startup; do not be alarmed by their absence pre-immich-deploy. + The `vchord.so` library is preloaded via + `shared_preload_libraries` regardless, so `CREATE EXTENSION` at + app startup just registers it in the right database. + +## Borgmatic implications + +`borgmatic.cfg` on indri targets `immich-pg-tailscale` over the +tailnet. During migration both clusters will exist briefly. Decide +upfront: backup the *source* pg until cutover, then flip borgmatic +to the ringtail Tailscale service. Document the flip in +[[immich-cutover-and-decommission]]. + +## Out of scope + +- Importing data. That is [[immich-pg-data-migration]], which may + drive a reset on this card if the migration approach (e.g. CNPG + `externalCluster` bootstrap) requires changes to this Cluster CR. diff --git a/docs/how-to/immich/migrate-immich-to-ringtail.md b/docs/how-to/immich/migrate-immich-to-ringtail.md new file mode 100644 index 0000000..cd23384 --- /dev/null +++ b/docs/how-to/immich/migrate-immich-to-ringtail.md @@ -0,0 +1,132 @@ +--- +title: Migrate Immich to Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - immich + - migration +--- + +# Migrate Immich to Ringtail + +Move the entire Immich stack (server, ML, valkey, postgres) off +`minikube-indri` and onto `k3s-ringtail`. This is the first concrete +chain in the broader indri-k8s decommission: minikube is +memory-saturated (97% RAM, swapping), and Immich is the single +largest tenant (~1.5 GiB resident). + +## End state + +- Immich `server`, `machine-learning`, and `valkey` Deployments run on + ringtail k3s in the `immich` namespace. +- The `immich-machine-learning` pod uses ringtail's RTX 4080 via the + `nvidia-device-plugin` (performance win — currently CPU-only on + minikube). +- A CNPG `immich-pg` Cluster (PostgreSQL 17 + VectorChord) runs in a + `databases` namespace on ringtail, owned by the `cnpg-system` + operator on ringtail. +- The photo library still lives on [[sifaka]] at `/volume1/photos`, + mounted via NFS from ringtail pods (RWX). +- Routing: `photos.ops.eblu.me` (Caddy on indri) proxies to a + Tailscale ProxyGroup ingress on ringtail. No public surface today. +- The ArgoCD `immich` app's `destination.server` points at + `https://ringtail.tail8d86e.ts.net:6443`. The old minikube + manifests are removed. + +## Non-goals + +- Public exposure via Fly. Immich stays tailnet-only. +- Changing the immich version or runtime configuration. This is a + lift-and-shift; bumps come later. +- Backing up to a different target. [[borgmatic]] keeps running on + indri (it pulls via Tailscale and uses sifaka SMB for the library). + +## Critical constraint: no data loss + +Downtime is acceptable (Immich is a single-user system; we can take +it offline for the cutover). **Data loss is not.** Two surfaces matter: + +1. **Postgres** — face data, ML embeddings (vectors), album state, + sharing, etc. Re-derivable in theory; weeks of recompute in + practice. See [[immich-pg-data-migration]]. +2. **Library files** — `/volume1/photos`. Not moving, but the NFS + path must be verified accessible from ringtail before cutover. + See [[sifaka-nfs-from-ringtail]]. + +[[borgmatic]] backs both up to sifaka + BorgBase nightly; restore is +possible but slow. Treat it as a fallback, not a plan. + +## Why postgres on ringtail (not cross-cluster) + +`immich-pg` already has a Tailscale Service we could point ringtail +at, leaving the DB on minikube. We're not doing that because: + +- The whole goal is to retire minikube — keeping pg there blocks it. +- Immich is chatty against pg; tailnet round-trips would hurt. +- CNPG is the same operator on both sides — a Cluster CR on ringtail + is mechanically equivalent. + +## Approach + +This is a C2 Mikado chain. The prerequisite cards each represent a +distinct surface that has to work before cutover. See +[[agent-change-process#C2 — Mikado Chain]] for the discipline. + +## Workflow note: registering new ArgoCD apps during the chain + +This chain adds three new ArgoCD `Application` definitions in +`argocd/apps/`: `cloudnative-pg-ringtail`, `databases-ringtail`, +and (later) `immich-ringtail`. The usual C1/C2 pattern of +`argocd app set --revision && argocd app sync ` +does NOT work for the app-of-apps `apps` Application itself, because +`apps` self-manages: it re-reads `apps.yaml` (which declares +`targetRevision: main`) on every sync and reverts the override. As a +result, new app definitions added on a feature branch are never +visible to the cluster via `apps`. + +**Use `kubectl apply` to register each new Application directly:** + +```fish +kubectl --context=minikube-indri apply -f argocd/apps/.yaml +``` + +This creates the Application resource out-of-band, bypassing `apps`. + +For apps whose source lives in **this** repo (e.g. +`databases-ringtail`, `immich-ringtail` — manifest paths exist only +on the branch until merge), follow the apply with a branch override: + +```fish +argocd app set --revision mikado/migrate-immich-to-ringtail +argocd app sync +``` + +For apps whose source is an **external** repo at a pinned tag (e.g. +`cloudnative-pg-ringtail` → `mirrors/cloudnative-pg` `v1.27.1`), no +override is needed — the source revision is independent of this PR. + +After PR merge: + +```fish +argocd app set --revision main +argocd app sync +``` + +`apps` itself, on its next sync from `main`, will discover the new +Application definitions in `argocd/apps/` and adopt the already-running +resources without disruption — provided their in-cluster spec matches +the on-disk definitions (which it does because we applied the same +file). + +## Related + +- [[shower-on-ringtail]] — a previous migration to ringtail (simpler: + no upstream cluster, SQLite, no GPU) +- [[connect-to-postgres]] — getting a psql session against CNPG +- [[ringtail]] — the target cluster +- [[cnpg-on-ringtail]], [[immich-pg-on-ringtail]], + [[immich-pg-data-migration]], [[sifaka-nfs-from-ringtail]], + [[immich-app-on-ringtail]], [[immich-cutover-and-decommission]] — + the prerequisite cards diff --git a/docs/how-to/immich/sifaka-nfs-from-ringtail.md b/docs/how-to/immich/sifaka-nfs-from-ringtail.md new file mode 100644 index 0000000..2c490c1 --- /dev/null +++ b/docs/how-to/immich/sifaka-nfs-from-ringtail.md @@ -0,0 +1,67 @@ +--- +title: Sifaka NFS Photos from Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +tags: + - how-to + - operations + - storage + - nfs + - sifaka +--- + +# Sifaka NFS Photos from Ringtail + +The Immich library lives at `sifaka:/volume1/photos` and is mounted +into the pod via an NFS PV (see `argocd/manifests/immich/pv-nfs.yaml`). +That PV is currently scoped to indri. We need ringtail to mount the +same path with the same RWX semantics, without breaking the existing +indri mount during the transition. + +## What to verify / do + +- Check `sifaka` DSM NFS rules for the `photos` share. Per + [[shower-on-ringtail#NFS + SMB share on sifaka]] convention, rules + use `192.168.1.0/24` + `100.64.0.0/10` with + `all_squash`/`Map all users to admin`. The existing rule may + already cover ringtail (it's on `192.168.1.21` per the recent + static-IP pin). If so this card is a verification card. +- If the rule is locked to indri's IP: add an entry for ringtail + (192.168.1.21) or widen to the subnet pattern above. +- Test mount from a ringtail debug pod (busybox or alpine with + nfs-utils) against the `photos` share. Read a file. Write a temp + file. Delete it. +- Watch for the known sifaka NFS-over-Tailscale gotcha: sifaka's + Tailscale must be in TUN mode (not userspace) for NFS to work + reliably over the tailnet. The NFS path here goes over the LAN + (not tailnet), so this shouldn't bite, but worth confirming the + NFS traffic is on `192.168.1.x` not `100.x`. + +## PV + PVC on ringtail + +- New `pv-nfs.yaml` mirroring the minikube one (name can be shared + if the PV is cluster-scoped — but PVs are per-cluster, so just + duplicate). Same `server: sifaka`, same path, same + `accessModes: [ReadWriteMany]`, `persistentVolumeReclaimPolicy: + Retain`. +- New `pvc.yaml` in the ringtail `immich` namespace bound to it. +- The minikube PVC stays bound and active until cutover — both + clusters can have the share NFS-mounted simultaneously (NFS RWX + permits this). Immich itself must not be running on both sides + at once. + +## Verification + +- A pod on ringtail can `ls /mnt/photos/` and see the same files + as the indri pod. +- File written from ringtail pod is visible from indri pod and + vice versa (proves there's no caching surprise). + +## Out of scope + +- Migrating photo files. Nothing moves; this is just adding a second + NFS client. +- The `pvc-ml-cache.yaml` PVC (a separate ML model cache). That's + not on NFS — it's a regular PVC. Recreated empty on ringtail in + [[immich-app-on-ringtail]]; the first ML pod boot will repopulate + it.