C2: migrate immich from minikube to ringtail (mikado chain) (#356)

## Summary

C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.

This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.

## Goal end-state

- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
  CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.

## Cards

| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |

## Key constraints

- **No data loss.** Downtime is acceptable; data loss is not. Two
  surfaces matter: postgres (ML embeddings, face data — slow to
  re-derive) and the library files (don't move, but NFS access from
  ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
  basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
  documented fallback. Either way, dry-run against a scratch
  cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
  would block the whole decommission, and Immich is chatty with pg
  so tailnet round-trips would hurt.

## Test plan

- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
      chain correctly.
- [ ] Per-card implementation cycles land separately (commit
      convention enforced by hook).

Reviewed-on: #356
This commit is contained in:
Erich Blume 2026-05-13 16:46:17 -07:00
commit 947e4310c3
32 changed files with 820 additions and 265 deletions

View file

@ -0,0 +1,27 @@
# CloudNativePG Operator for ringtail k3s cluster
# Deploys the operator only; PostgreSQL clusters are created separately
#
# Sibling of cloudnative-pg.yaml (minikube). Same mirror, same release,
# different destination. Both apps will coexist during the immich
# migration; the minikube one is removed at the end of the broader
# indri-k8s decommission.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cloudnative-pg-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git
targetRevision: v1.27.1
path: releases
directory:
include: 'cnpg-1.27.1.yaml'
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: cnpg-system
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # Required for large CRDs that exceed annotation size limit

View file

@ -0,0 +1,26 @@
# Databases on ringtail k3s.
#
# Today: only immich-pg (CNPG Cluster) + its borgmatic ExternalSecret.
# More databases may move here as the indri-k8s decommission proceeds.
#
# Prerequisites:
# - cloudnative-pg-ringtail (operator must exist before the Cluster CR)
# - external-secrets-ringtail + 1password-connect-ringtail (for the
# immich-pg-borgmatic ExternalSecret to sync)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: databases-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/databases-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: databases
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,31 @@
# Immich on ringtail k3s.
#
# Staging deployment; the minikube `immich` app remains in parallel
# until cutover. See [[immich-cutover-and-decommission]] for the
# routing flip + minikube cleanup.
#
# Prerequisites:
# - cnpg-on-ringtail + databases-ringtail (postgres)
# - 1password-connect-ringtail + external-secrets-ringtail (not used
# by this app today — immich-db Secret is created manually,
# matching the minikube pattern)
# - The immich-db Secret in the immich namespace, holding the
# password for the `immich` postgres role (copied from the source
# immich-pg-app Secret at migration time).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,30 +0,0 @@
# Immich - Self-hosted photo and video management
# High-performance Google Photos/iCloud alternative with AI features
#
# Kustomize manifests in argocd/manifests/immich/
# Components: server, machine-learning, valkey (Redis)
#
# Prerequisites:
# 1. Create immich namespace and secrets:
# kubectl create namespace immich
# kubectl --context=minikube-indri create secret generic immich-db -n immich \
# --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)"
# 2. Create immich-pg database and user (see immich-pg app)
# 3. NFS share on sifaka at /volume1/photos with read/write for indri
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich
destination:
server: https://kubernetes.default.svc
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,9 +1,12 @@
# ExternalSecret for borgmatic backup user password on immich-pg cluster
# (ringtail k3s).
#
# Mirror of argocd/manifests/databases/external-secret-immich-borgmatic.yaml.
# The onepassword-blumeops ClusterSecretStore exists on ringtail via the
# external-secrets-ringtail app.
#
# Reuses the same 1Password item as blumeops-pg-borgmatic.
# 1Password item: "borgmatic" in blumeops vault
# Field: "db-password"
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
@ -23,7 +26,7 @@ spec:
username: borgmatic
password: "{{ .password }}"
data:
- secretKey: password
remoteRef:
key: borgmatic
property: db-password
- secretKey: password
remoteRef:
key: borgmatic
property: db-password

View file

@ -0,0 +1,53 @@
# PostgreSQL Cluster for Immich on ringtail k3s.
#
# Initially bootstrapped via CNPG pg_basebackup from the minikube
# immich-pg cluster on 2026-05-13, then promoted to primary. The
# externalClusters + bootstrap.pg_basebackup blocks have been pruned
# from this manifest now that the migration is complete — leaving
# them around is a footgun (re-enabling replica.enabled=true would
# try to demote this cluster against a stale source). See
# [[immich-pg-data-migration]] for the procedure used.
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: local-path
# Managed roles
managed:
roles:
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
postgresql:
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -0,0 +1,9 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: databases
resources:
- immich-pg.yaml
- external-secret-immich-borgmatic.yaml
- service-immich-pg-tailscale.yaml

View file

@ -1,6 +1,8 @@
# Tailscale LoadBalancer for immich-pg PostgreSQL access
# Canonical hostname: immich-pg.tail8d86e.ts.net
# Caddy L4 proxies pg.ops.eblu.me:5433 → this service for borgmatic backups
# Tailscale LoadBalancer for immich-pg PostgreSQL access on ringtail.
# Canonical hostname: immich-pg.tail8d86e.ts.net (claimed from the
# minikube side after the minikube service was removed during the
# immich-to-ringtail migration). Borgmatic on indri uses this
# hostname for nightly backups.
apiVersion: v1
kind: Service
metadata:

View file

@ -1,69 +0,0 @@
# PostgreSQL Cluster for Immich
# Uses VectorChord (successor to pgvecto.rs) for AI-powered vector search
# See: https://github.com/immich-app/immich/discussions/9060
# Managed by CloudNativePG operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
# VectorChord image for PostgreSQL 17 with VectorChord 0.5.0
# Immich v2.4.1 requires VectorChord >=0.3 <0.6
# See: https://github.com/tensorchord/VectorChord
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: standard
# Bootstrap creates initial database and owner
bootstrap:
initdb:
database: immich
owner: immich
postInitSQL:
# Extensions required by Immich
- CREATE EXTENSION IF NOT EXISTS vector;
- CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
- CREATE EXTENSION IF NOT EXISTS cube CASCADE;
- CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE;
# Managed roles
# Note: connectionLimit, ensure, inherit are CNPG defaults added to prevent ArgoCD drift
managed:
roles:
# borgmatic read-only user for backups
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
# Resource limits for minikube environment
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
# PostgreSQL configuration
postgresql:
# VectorChord requires vchord.so in shared_preload_libraries
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
# Allow connections from k8s pods
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -5,13 +5,10 @@ namespace: databases
resources:
- blumeops-pg.yaml
- immich-pg.yaml
- service-tailscale.yaml
- service-immich-pg-tailscale.yaml
- service-metrics-tailscale.yaml
- external-secret-eblume.yaml
- external-secret-borgmatic.yaml
- external-secret-immich-borgmatic.yaml
- external-secret-teslamate.yaml
- external-secret-authentik.yaml
- external-secret-paperless.yaml

View file

@ -16,11 +16,16 @@ spec:
app: immich
component: machine-learning
spec:
runtimeClassName: nvidia
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: machine-learning
# ringtail uses the -cuda tag (set in kustomization.yaml)
# to take advantage of the RTX 4080 via the nvidia
# device plugin. Time-slicing is configured for 4 replicas
# so frigate + ollama + this pod can share.
image: ghcr.io/immich-app/immich-machine-learning:kustomized
ports:
- name: http
@ -57,6 +62,7 @@ spec:
cpu: "100m"
limits:
memory: "4Gi"
nvidia.com/gpu: "1"
volumes:
- name: cache
persistentVolumeClaim:

View file

@ -1,6 +1,9 @@
# Tailscale Ingress for Immich
# Exposes Immich at photos.tail8d86e.ts.net
# Caddy will proxy photos.ops.eblu.me to this endpoint
# Tailscale ProxyGroup Ingress for Immich on ringtail.
#
# Production hostname: photos.tail8d86e.ts.net
# (during the cutover window this was photos-ringtail; the minikube
# ingress was torn down before this was renamed to photos to avoid
# the Tailscale device-name collision.)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
@ -16,12 +19,6 @@ metadata:
gethomepage.dev/description: "Photo management"
gethomepage.dev/href: "https://photos.ops.eblu.me"
gethomepage.dev/pod-selector: "app=immich,component=server"
# TODO: Add Immich widget - requires API key from Account Settings > API Keys
# See: https://gethomepage.dev/widgets/services/immich/
# gethomepage.dev/widget.type: "immich"
# gethomepage.dev/widget.url: "https://photos.ops.eblu.me"
# gethomepage.dev/widget.key: "{{HOMEPAGE_VAR_IMMICH_API_KEY}}"
# gethomepage.dev/widget.version: "2"
spec:
ingressClassName: tailscale
rules:

View file

@ -1,7 +1,8 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: immich
resources:
- deployment-server.yaml
- deployment-ml.yaml
@ -13,11 +14,15 @@ resources:
- pv-nfs.yaml
- pvc.yaml
- ingress-tailscale.yaml
images:
- name: ghcr.io/immich-app/immich-server
newTag: v2.6.3
- name: ghcr.io/immich-app/immich-machine-learning
newTag: v2.6.3
# CUDA variant of the same release — ringtail has an RTX 4080
newTag: v2.6.3-cuda
# Using upstream multi-arch valkey image directly; the
# registry.ops.eblu.me/blumeops/valkey mirror is arm64-only (built
# on indri) and would crashloop on ringtail.
- name: docker.io/valkey/valkey
newName: registry.ops.eblu.me/blumeops/valkey
newTag: v8.1.6-r0-fabca04
newTag: "8.1.6"

View file

@ -0,0 +1,29 @@
# NFS PersistentVolume for Immich photo library on ringtail k3s.
#
# Mirror of argocd/manifests/immich/pv-nfs.yaml (minikube) but with
# a distinct name (minikube and ringtail are separate clusters, so PV
# names don't collide cluster-side, but using the same name in two
# manifests is confusing).
#
# The sifaka NFS export for /volume1/photos already permits
# 192.168.1.0/24 + 100.64.0.0/10. Ringtail's wired IP (192.168.1.21)
# falls in the first CIDR, so no DSM rule changes are needed.
#
# Verified 2026-05-13: ringtail pod can read existing dirs, write
# new files, and delete them. DNS resolves sifaka to 192.168.1.203
# (LAN), so NFS traffic stays off the tailnet — avoids the known
# sifaka-tailscale-userspace bite.
apiVersion: v1
kind: PersistentVolume
metadata:
name: immich-library-nfs-pv-ringtail
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: sifaka
path: /volume1/photos

View file

@ -1,5 +1,5 @@
# PersistentVolumeClaim for Immich photo library
# Binds to the NFS PV for sifaka:/volume1/photos
# PersistentVolumeClaim for Immich photo library on ringtail.
# Binds to immich-library-nfs-pv-ringtail (sifaka:/volume1/photos).
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
@ -9,7 +9,7 @@ spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: immich-library-nfs-pv
volumeName: immich-library-nfs-pv-ringtail
resources:
requests:
storage: 2Ti

View file

@ -1,115 +0,0 @@
# Immich
Self-hosted photo and video management solution with AI-powered search and face recognition.
## Prerequisites
1. **NFS Share**: Create `/volume1/photos` on sifaka with NFS permissions for indri
2. **PostgreSQL**: The `immich-pg` cluster (with pgvecto.rs) must be healthy
3. **Secrets**: Create the database password secret
## Deployment Order
1. Sync `blumeops-pg` (to get CloudNativePG operator if not already running)
2. Wait for `immich-pg` cluster to be healthy
3. Create secrets (see below)
4. Sync `immich` (deploys all resources: storage, services, deployments)
5. Run `mise run provision-indri -- --tags caddy` to update Caddy config
## Components
| Component | Deployment | Service | Port |
|-----------|------------|---------|------|
| Server (web/API) | `immich-server` | `immich-server` | 2283 |
| Machine Learning | `immich-machine-learning` | `immich-machine-learning` | 3003 |
| Valkey (Redis) | `immich-valkey` | `immich-valkey` | 6379 |
## Secret Setup
The `immich-db` secret contains the database password, which is auto-generated by CloudNativePG
in the `immich-pg-app` secret. To create or regenerate the secret:
```bash
# Create namespace if needed
kubectl --context=minikube-indri create namespace immich
# Copy password from CNPG secret to immich namespace
kubectl --context=minikube-indri create secret generic immich-db -n immich \
--from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)"
```
Note: This secret is not managed by ExternalSecrets since the source of truth is the CNPG-generated secret.
## Access
- **URL**: https://photos.ops.eblu.me (after Caddy is updated)
- **Tailscale**: https://photos.tail8d86e.ts.net (direct)
## First-Time Setup
1. Navigate to https://photos.ops.eblu.me
2. Create an admin account
3. Configure external library (optional - for importing existing photos)
## External Library (iCloud Photos)
To import existing photos from iCloud sync on indri:
1. In Immich Admin > External Libraries, create a new library
2. Set the import path to the location where iCloud photos sync
3. Configure scan schedule or trigger manual scan
## Architecture
```
┌─────────────────┐ ┌─────────────────┐
│ immich-server │────▶│ immich-pg │
│ (web/api) │ │ (PostgreSQL │
└────────┬────────┘ │ + pgvecto.rs) │
│ └─────────────────┘
┌────────▼────────┐ ┌─────────────────┐
│ immich-ml │ │ valkey │
│ (ML inference) │ │ (Redis cache) │
└─────────────────┘ └─────────────────┘
┌────────▼────────┐
│ sifaka NFS │
│ /volume1/photos│
└─────────────────┘
```
## Version Management
Image versions are controlled via `kustomization.yaml`:
```yaml
images:
- name: ghcr.io/immich-app/immich-server
newTag: v2.6.3
- name: ghcr.io/immich-app/immich-machine-learning
newTag: v2.6.3
- name: docker.io/valkey/valkey
newTag: "8.1-alpine"
```
To upgrade, update `newTag` values and sync via ArgoCD.
## Troubleshooting
```bash
# Check pods
kubectl --context=minikube-indri -n immich get pods
# Check immich-pg cluster
kubectl --context=minikube-indri -n databases get cluster immich-pg
# View server logs
kubectl --context=minikube-indri -n immich logs -l app=immich,component=server
# View ML logs
kubectl --context=minikube-indri -n immich logs -l app=immich,component=machine-learning
# Check PVC binding
kubectl --context=minikube-indri -n immich get pvc
```

View file

@ -1,22 +0,0 @@
# NFS PersistentVolume for Immich photo library
# Requires: NFS share on sifaka at /volume1/photos with NFS permissions for indri
#
# To create on Synology:
# 1. Control Panel > Shared Folder > Create
# 2. Name: photos, Location: Volume 1
# 3. Control Panel > File Services > NFS > NFS Rules
# 4. Add rule for "photos" share: Hostname=indri, Privilege=Read/Write, Squash=No mapping
apiVersion: v1
kind: PersistentVolume
metadata:
name: immich-library-nfs-pv
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: sifaka
path: /volume1/photos

View file

@ -11,4 +11,4 @@ data:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 2
replicas: 4