C2: migrate immich from minikube to ringtail (mikado chain) (#356)

## Summary

C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.

This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.

## Goal end-state

- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
  CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.

## Cards

| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |

## Key constraints

- **No data loss.** Downtime is acceptable; data loss is not. Two
  surfaces matter: postgres (ML embeddings, face data — slow to
  re-derive) and the library files (don't move, but NFS access from
  ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
  basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
  documented fallback. Either way, dry-run against a scratch
  cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
  would block the whole decommission, and Immich is chatty with pg
  so tailnet round-trips would hurt.

## Test plan

- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
      chain correctly.
- [ ] Per-card implementation cycles land separately (commit
      convention enforced by hook).

Reviewed-on: #356
This commit is contained in:
Erich Blume 2026-05-13 16:46:17 -07:00
commit 947e4310c3
32 changed files with 820 additions and 265 deletions

View file

@ -0,0 +1,27 @@
# CloudNativePG Operator for ringtail k3s cluster
# Deploys the operator only; PostgreSQL clusters are created separately
#
# Sibling of cloudnative-pg.yaml (minikube). Same mirror, same release,
# different destination. Both apps will coexist during the immich
# migration; the minikube one is removed at the end of the broader
# indri-k8s decommission.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cloudnative-pg-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git
targetRevision: v1.27.1
path: releases
directory:
include: 'cnpg-1.27.1.yaml'
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: cnpg-system
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # Required for large CRDs that exceed annotation size limit

View file

@ -0,0 +1,26 @@
# Databases on ringtail k3s.
#
# Today: only immich-pg (CNPG Cluster) + its borgmatic ExternalSecret.
# More databases may move here as the indri-k8s decommission proceeds.
#
# Prerequisites:
# - cloudnative-pg-ringtail (operator must exist before the Cluster CR)
# - external-secrets-ringtail + 1password-connect-ringtail (for the
# immich-pg-borgmatic ExternalSecret to sync)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: databases-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/databases-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: databases
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,31 @@
# Immich on ringtail k3s.
#
# Staging deployment; the minikube `immich` app remains in parallel
# until cutover. See [[immich-cutover-and-decommission]] for the
# routing flip + minikube cleanup.
#
# Prerequisites:
# - cnpg-on-ringtail + databases-ringtail (postgres)
# - 1password-connect-ringtail + external-secrets-ringtail (not used
# by this app today — immich-db Secret is created manually,
# matching the minikube pattern)
# - The immich-db Secret in the immich namespace, holding the
# password for the `immich` postgres role (copied from the source
# immich-pg-app Secret at migration time).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,30 +0,0 @@
# Immich - Self-hosted photo and video management
# High-performance Google Photos/iCloud alternative with AI features
#
# Kustomize manifests in argocd/manifests/immich/
# Components: server, machine-learning, valkey (Redis)
#
# Prerequisites:
# 1. Create immich namespace and secrets:
# kubectl create namespace immich
# kubectl --context=minikube-indri create secret generic immich-db -n immich \
# --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)"
# 2. Create immich-pg database and user (see immich-pg app)
# 3. NFS share on sifaka at /volume1/photos with read/write for indri
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich
destination:
server: https://kubernetes.default.svc
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,9 +1,12 @@
# ExternalSecret for borgmatic backup user password on immich-pg cluster
# (ringtail k3s).
#
# Mirror of argocd/manifests/databases/external-secret-immich-borgmatic.yaml.
# The onepassword-blumeops ClusterSecretStore exists on ringtail via the
# external-secrets-ringtail app.
#
# Reuses the same 1Password item as blumeops-pg-borgmatic.
# 1Password item: "borgmatic" in blumeops vault
# Field: "db-password"
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
@ -23,7 +26,7 @@ spec:
username: borgmatic
password: "{{ .password }}"
data:
- secretKey: password
remoteRef:
key: borgmatic
property: db-password
- secretKey: password
remoteRef:
key: borgmatic
property: db-password

View file

@ -0,0 +1,53 @@
# PostgreSQL Cluster for Immich on ringtail k3s.
#
# Initially bootstrapped via CNPG pg_basebackup from the minikube
# immich-pg cluster on 2026-05-13, then promoted to primary. The
# externalClusters + bootstrap.pg_basebackup blocks have been pruned
# from this manifest now that the migration is complete — leaving
# them around is a footgun (re-enabling replica.enabled=true would
# try to demote this cluster against a stale source). See
# [[immich-pg-data-migration]] for the procedure used.
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: local-path
# Managed roles
managed:
roles:
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
postgresql:
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -0,0 +1,9 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: databases
resources:
- immich-pg.yaml
- external-secret-immich-borgmatic.yaml
- service-immich-pg-tailscale.yaml

View file

@ -1,6 +1,8 @@
# Tailscale LoadBalancer for immich-pg PostgreSQL access
# Canonical hostname: immich-pg.tail8d86e.ts.net
# Caddy L4 proxies pg.ops.eblu.me:5433 → this service for borgmatic backups
# Tailscale LoadBalancer for immich-pg PostgreSQL access on ringtail.
# Canonical hostname: immich-pg.tail8d86e.ts.net (claimed from the
# minikube side after the minikube service was removed during the
# immich-to-ringtail migration). Borgmatic on indri uses this
# hostname for nightly backups.
apiVersion: v1
kind: Service
metadata:

View file

@ -1,69 +0,0 @@
# PostgreSQL Cluster for Immich
# Uses VectorChord (successor to pgvecto.rs) for AI-powered vector search
# See: https://github.com/immich-app/immich/discussions/9060
# Managed by CloudNativePG operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
# VectorChord image for PostgreSQL 17 with VectorChord 0.5.0
# Immich v2.4.1 requires VectorChord >=0.3 <0.6
# See: https://github.com/tensorchord/VectorChord
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: standard
# Bootstrap creates initial database and owner
bootstrap:
initdb:
database: immich
owner: immich
postInitSQL:
# Extensions required by Immich
- CREATE EXTENSION IF NOT EXISTS vector;
- CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
- CREATE EXTENSION IF NOT EXISTS cube CASCADE;
- CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE;
# Managed roles
# Note: connectionLimit, ensure, inherit are CNPG defaults added to prevent ArgoCD drift
managed:
roles:
# borgmatic read-only user for backups
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
# Resource limits for minikube environment
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
# PostgreSQL configuration
postgresql:
# VectorChord requires vchord.so in shared_preload_libraries
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
# Allow connections from k8s pods
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -5,13 +5,10 @@ namespace: databases
resources:
- blumeops-pg.yaml
- immich-pg.yaml
- service-tailscale.yaml
- service-immich-pg-tailscale.yaml
- service-metrics-tailscale.yaml
- external-secret-eblume.yaml
- external-secret-borgmatic.yaml
- external-secret-immich-borgmatic.yaml
- external-secret-teslamate.yaml
- external-secret-authentik.yaml
- external-secret-paperless.yaml

View file

@ -16,11 +16,16 @@ spec:
app: immich
component: machine-learning
spec:
runtimeClassName: nvidia
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: machine-learning
# ringtail uses the -cuda tag (set in kustomization.yaml)
# to take advantage of the RTX 4080 via the nvidia
# device plugin. Time-slicing is configured for 4 replicas
# so frigate + ollama + this pod can share.
image: ghcr.io/immich-app/immich-machine-learning:kustomized
ports:
- name: http
@ -57,6 +62,7 @@ spec:
cpu: "100m"
limits:
memory: "4Gi"
nvidia.com/gpu: "1"
volumes:
- name: cache
persistentVolumeClaim:

View file

@ -1,6 +1,9 @@
# Tailscale Ingress for Immich
# Exposes Immich at photos.tail8d86e.ts.net
# Caddy will proxy photos.ops.eblu.me to this endpoint
# Tailscale ProxyGroup Ingress for Immich on ringtail.
#
# Production hostname: photos.tail8d86e.ts.net
# (during the cutover window this was photos-ringtail; the minikube
# ingress was torn down before this was renamed to photos to avoid
# the Tailscale device-name collision.)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
@ -16,12 +19,6 @@ metadata:
gethomepage.dev/description: "Photo management"
gethomepage.dev/href: "https://photos.ops.eblu.me"
gethomepage.dev/pod-selector: "app=immich,component=server"
# TODO: Add Immich widget - requires API key from Account Settings > API Keys
# See: https://gethomepage.dev/widgets/services/immich/
# gethomepage.dev/widget.type: "immich"
# gethomepage.dev/widget.url: "https://photos.ops.eblu.me"
# gethomepage.dev/widget.key: "{{HOMEPAGE_VAR_IMMICH_API_KEY}}"
# gethomepage.dev/widget.version: "2"
spec:
ingressClassName: tailscale
rules:

View file

@ -1,7 +1,8 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: immich
resources:
- deployment-server.yaml
- deployment-ml.yaml
@ -13,11 +14,15 @@ resources:
- pv-nfs.yaml
- pvc.yaml
- ingress-tailscale.yaml
images:
- name: ghcr.io/immich-app/immich-server
newTag: v2.6.3
- name: ghcr.io/immich-app/immich-machine-learning
newTag: v2.6.3
# CUDA variant of the same release — ringtail has an RTX 4080
newTag: v2.6.3-cuda
# Using upstream multi-arch valkey image directly; the
# registry.ops.eblu.me/blumeops/valkey mirror is arm64-only (built
# on indri) and would crashloop on ringtail.
- name: docker.io/valkey/valkey
newName: registry.ops.eblu.me/blumeops/valkey
newTag: v8.1.6-r0-fabca04
newTag: "8.1.6"

View file

@ -0,0 +1,29 @@
# NFS PersistentVolume for Immich photo library on ringtail k3s.
#
# Mirror of argocd/manifests/immich/pv-nfs.yaml (minikube) but with
# a distinct name (minikube and ringtail are separate clusters, so PV
# names don't collide cluster-side, but using the same name in two
# manifests is confusing).
#
# The sifaka NFS export for /volume1/photos already permits
# 192.168.1.0/24 + 100.64.0.0/10. Ringtail's wired IP (192.168.1.21)
# falls in the first CIDR, so no DSM rule changes are needed.
#
# Verified 2026-05-13: ringtail pod can read existing dirs, write
# new files, and delete them. DNS resolves sifaka to 192.168.1.203
# (LAN), so NFS traffic stays off the tailnet — avoids the known
# sifaka-tailscale-userspace bite.
apiVersion: v1
kind: PersistentVolume
metadata:
name: immich-library-nfs-pv-ringtail
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: sifaka
path: /volume1/photos

View file

@ -1,5 +1,5 @@
# PersistentVolumeClaim for Immich photo library
# Binds to the NFS PV for sifaka:/volume1/photos
# PersistentVolumeClaim for Immich photo library on ringtail.
# Binds to immich-library-nfs-pv-ringtail (sifaka:/volume1/photos).
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
@ -9,7 +9,7 @@ spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: immich-library-nfs-pv
volumeName: immich-library-nfs-pv-ringtail
resources:
requests:
storage: 2Ti

View file

@ -1,115 +0,0 @@
# Immich
Self-hosted photo and video management solution with AI-powered search and face recognition.
## Prerequisites
1. **NFS Share**: Create `/volume1/photos` on sifaka with NFS permissions for indri
2. **PostgreSQL**: The `immich-pg` cluster (with pgvecto.rs) must be healthy
3. **Secrets**: Create the database password secret
## Deployment Order
1. Sync `blumeops-pg` (to get CloudNativePG operator if not already running)
2. Wait for `immich-pg` cluster to be healthy
3. Create secrets (see below)
4. Sync `immich` (deploys all resources: storage, services, deployments)
5. Run `mise run provision-indri -- --tags caddy` to update Caddy config
## Components
| Component | Deployment | Service | Port |
|-----------|------------|---------|------|
| Server (web/API) | `immich-server` | `immich-server` | 2283 |
| Machine Learning | `immich-machine-learning` | `immich-machine-learning` | 3003 |
| Valkey (Redis) | `immich-valkey` | `immich-valkey` | 6379 |
## Secret Setup
The `immich-db` secret contains the database password, which is auto-generated by CloudNativePG
in the `immich-pg-app` secret. To create or regenerate the secret:
```bash
# Create namespace if needed
kubectl --context=minikube-indri create namespace immich
# Copy password from CNPG secret to immich namespace
kubectl --context=minikube-indri create secret generic immich-db -n immich \
--from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)"
```
Note: This secret is not managed by ExternalSecrets since the source of truth is the CNPG-generated secret.
## Access
- **URL**: https://photos.ops.eblu.me (after Caddy is updated)
- **Tailscale**: https://photos.tail8d86e.ts.net (direct)
## First-Time Setup
1. Navigate to https://photos.ops.eblu.me
2. Create an admin account
3. Configure external library (optional - for importing existing photos)
## External Library (iCloud Photos)
To import existing photos from iCloud sync on indri:
1. In Immich Admin > External Libraries, create a new library
2. Set the import path to the location where iCloud photos sync
3. Configure scan schedule or trigger manual scan
## Architecture
```
┌─────────────────┐ ┌─────────────────┐
│ immich-server │────▶│ immich-pg │
│ (web/api) │ │ (PostgreSQL │
└────────┬────────┘ │ + pgvecto.rs) │
│ └─────────────────┘
┌────────▼────────┐ ┌─────────────────┐
│ immich-ml │ │ valkey │
│ (ML inference) │ │ (Redis cache) │
└─────────────────┘ └─────────────────┘
┌────────▼────────┐
│ sifaka NFS │
│ /volume1/photos│
└─────────────────┘
```
## Version Management
Image versions are controlled via `kustomization.yaml`:
```yaml
images:
- name: ghcr.io/immich-app/immich-server
newTag: v2.6.3
- name: ghcr.io/immich-app/immich-machine-learning
newTag: v2.6.3
- name: docker.io/valkey/valkey
newTag: "8.1-alpine"
```
To upgrade, update `newTag` values and sync via ArgoCD.
## Troubleshooting
```bash
# Check pods
kubectl --context=minikube-indri -n immich get pods
# Check immich-pg cluster
kubectl --context=minikube-indri -n databases get cluster immich-pg
# View server logs
kubectl --context=minikube-indri -n immich logs -l app=immich,component=server
# View ML logs
kubectl --context=minikube-indri -n immich logs -l app=immich,component=machine-learning
# Check PVC binding
kubectl --context=minikube-indri -n immich get pvc
```

View file

@ -1,22 +0,0 @@
# NFS PersistentVolume for Immich photo library
# Requires: NFS share on sifaka at /volume1/photos with NFS permissions for indri
#
# To create on Synology:
# 1. Control Panel > Shared Folder > Create
# 2. Name: photos, Location: Volume 1
# 3. Control Panel > File Services > NFS > NFS Rules
# 4. Add rule for "photos" share: Hostname=indri, Privilege=Read/Write, Squash=No mapping
apiVersion: v1
kind: PersistentVolume
metadata:
name: immich-library-nfs-pv
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: sifaka
path: /volume1/photos

View file

@ -11,4 +11,4 @@ data:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 2
replicas: 4

View file

@ -0,0 +1,13 @@
Move the entire Immich stack — server, machine-learning, valkey,
and the PostgreSQL+VectorChord cluster — off `minikube-indri` and
onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG
`pg_basebackup` (replica catch-up then promote); row counts on
`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face`
verified equal between source and replica before cutover. The ML
pod now uses ringtail's RTX 4080 via the nvidia-device-plugin
(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy
routing at `photos.ops.eblu.me` is unchanged (still
`photos.tail8d86e.ts.net`, the device just lives on ringtail now).
Borgmatic backups continue against the same `immich-pg` tailnet
hostname. First concrete chain in the broader indri-k8s
decommission effort.

View file

@ -0,0 +1,52 @@
---
title: CNPG Operator on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- postgres
- ringtail
---
# CNPG Operator on Ringtail
Bring up the `cloudnative-pg` operator on `k3s-ringtail`. Today the
operator only exists on `minikube-indri` (see
`argocd/apps/cloudnative-pg.yaml`, destination `kubernetes.default.svc`).
Prerequisite of [[migrate-immich-to-ringtail]]; consumed by
[[immich-pg-on-ringtail]].
## What to do
- Add a sibling `argocd/apps/cloudnative-pg-ringtail.yaml` pointing
at the same mirror (`mirrors/cloudnative-pg`, tag `v1.27.1`),
destination `https://ringtail.tail8d86e.ts.net:6443`,
namespace `cnpg-system`.
- Mirror the `ServerSideApply=true` and `CreateNamespace=true` sync
options (the CRDs exceed the annotation size limit).
- Sync `apps` then `cloudnative-pg-ringtail`. Verify the operator
pod is running on ringtail.
## Verification
```fish
kubectl --context=k3s-ringtail -n cnpg-system get pods
kubectl --context=k3s-ringtail get crd clusters.postgresql.cnpg.io
```
## Why a separate app
Each ArgoCD app targets a single cluster via `destination.server`.
We could parameterize with ApplicationSets, but blumeops' convention
is to duplicate the manifest with a `-ringtail` suffix (see
`alloy-ringtail`, `external-secrets-ringtail`, etc.). Keep the
convention.
## Out of scope
- Postgres clusters themselves (`immich-pg`, etc.) — those come from
[[immich-pg-on-ringtail]].
- Removing the minikube cnpg operator. That happens at the very end
of the indri-k8s decommission, not in this chain.

View file

@ -0,0 +1,91 @@
---
title: Immich App on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- immich
---
# Immich App on Ringtail
Bring up `immich-server`, `immich-machine-learning`, and
`immich-valkey` on ringtail. This card stands the stack up against
the *new* pg cluster — it does not move user traffic. Cutover lives
in [[immich-cutover-and-decommission]].
## What to do
- New manifest dir `argocd/manifests/immich-ringtail/` (the suffix
matches the `-ringtail` convention used by other apps). Port from
`argocd/manifests/immich/`:
- `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail
pg service.
- `deployment-ml.yaml` — use `runtimeClassName: nvidia` + a
`resources.limits` for `nvidia.com/gpu: 1`. Use the `-cuda` tag
of the immich-ml image (set in kustomization). Ringtail is
single-node, so no node selector needed. See
`argocd/manifests/frigate/` for the existing GPU pod pattern.
**GPU contention discovery:** ringtail's `nvidia-device-plugin`
is configured with `timeSlicing.replicas: 2`. Frigate + Ollama
already consume both virtual slices. Adding immich-ml requires
bumping the count to >= 3. Edit
`argocd/manifests/nvidia-device-plugin/configmap.yaml` (or
wherever the device-plugin config lives) and re-sync the
`nvidia-device-plugin` ArgoCD app. The plugin pod restarts and
the new advertised count appears as the node's
`nvidia.com/gpu` allocatable.
- `deployment-valkey.yaml` — straight port, BUT use the upstream
multi-arch `docker.io/valkey/valkey:<version>` image — do NOT
use the `registry.ops.eblu.me/blumeops/valkey` rewrite in the
kustomization. That mirror was built on indri (arm64) and is
single-arch; pulling it on ringtail (amd64) gets `exec format
error` in CrashLoopBackOff. The mirror should eventually carry
a multi-arch tag, at which point the rewrite can return.
- `service*.yaml` — straight port.
- `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC).
- `pv-nfs.yaml` + `pvc.yaml` — already covered by
[[sifaka-nfs-from-ringtail]] (may live in this dir or theirs).
- `ingress-tailscale.yaml` — ProxyGroup ingress, **must not** set
an explicit `host:` (or use `host: *`) per the lesson on
ProxyGroup VIP routing.
**Hostname collision warning:** the minikube ingress claims the
Tailscale device name `photos` (`tls.hosts: [photos]`). Two
devices on the tailnet cannot share that name. While the
ringtail deployment is being staged it must use a *different*
`tls.hosts` value (e.g. `photos-ringtail`) so it can coexist
with the running minikube one. The flip to `photos` happens at
cutover time, *after* the minikube ingress has been removed.
See [[immich-cutover-and-decommission#Cutover sequence]].
- `kustomization.yaml` — same `images:` block (server, ML, valkey).
- New ArgoCD app `argocd/apps/immich-ringtail.yaml` targeting
ringtail, namespace `immich`. **Manual sync only** until the
cutover.
- Existing `argocd/apps/immich.yaml` (minikube) stays untouched
during this card — both apps exist briefly.
## Bring it up against a copy of the DB
Use the throwaway/test path from [[immich-pg-data-migration#Dry run
before real cutover]]: point the ringtail immich at the *test* pg
cluster first, verify the pod boots, the web UI loads (via
`kubectl port-forward`), assets list, ML embeddings query. Then
tear it down.
## Verification
- All three pods Ready.
- ML pod has a GPU attached: `nvidia-smi` inside the container shows
the 4080.
- `immich-server` connects to pg and valkey (no `ECONNREFUSED` in
logs).
- A `kubectl port-forward` to the server service shows the Immich
web UI.
## Out of scope
- Public/tailnet routing flip. Caddy still points at the minikube
Tailscale ingress until [[immich-cutover-and-decommission]].
- Removing the minikube immich. Same.

View file

@ -0,0 +1,103 @@
---
title: Immich Cutover and Decommission
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- immich
- migration
---
# Immich Cutover and Decommission
The user-visible flip. By the time this card opens, the ringtail
stack has been proven against a copy of the data. This card does the
real cutover.
## Pre-cutover checklist
- [[immich-pg-data-migration]] dry-run succeeded; method is chosen.
- Ringtail immich stack has been brought up against the test pg,
pods healthy, UI loaded ([[immich-app-on-ringtail#Verification]]).
- Borgmatic just ran successfully (a fresh nightly archive is a
belt-and-suspenders fallback, on top of the live source pg).
- User has been told to stop uploading from the iOS app for the
cutover window.
## Cutover sequence
1. **Quiesce source.** `kubectl --context=minikube-indri -n immich
scale deploy/immich-server --replicas=0` and same for ML. Leave
valkey + pg running. Confirm no client traffic on the source pg
via `pg_stat_activity`.
2. **Tear down the minikube Tailscale ingress.** The `photos`
Tailscale device name must be freed before ringtail's ingress can
claim it (Tailscale enforces uniqueness across the tailnet).
`kubectl --context=minikube-indri -n immich delete ingress
immich-tailscale` and wait for the corresponding `tailscale`-LB
StatefulSet pod to terminate. Verify the `photos` device is gone:
`tailscale status | grep -i photos` from any tailnet host.
3. **Final sync.** Per chosen method in
[[immich-pg-data-migration]]:
- Option A: promote the ringtail replica.
- Option B: take final `pg_dump`, restore to ringtail
`immich-pg`.
4. **Verify.** Run the row-count and schema-diff checks from
[[immich-pg-data-migration#Verification on the real run]].
5. **Flip the ringtail ingress to `photos`.** Update
`argocd/manifests/immich-ringtail/ingress-tailscale.yaml`:
`tls.hosts: [photos]` (was `[photos-ringtail]` during staging per
[[immich-app-on-ringtail]]). Commit, `argocd app sync
immich-ringtail`. Wait for the `photos` device to register on the
tailnet again.
6. **Bring up ringtail immich** against the now-promoted pg
(`argocd app sync immich-ringtail`). Wait for Ready.
7. **Flip routing.** Update Caddy on indri
(`ansible/roles/caddy/defaults/main.yml`): `photos.ops.eblu.me`
upstream changes to the ringtail Tailscale ingress hostname
(`photos` — same MagicDNS name, now pointing to the ringtail
proxy). `mise run provision-indri -- --tags caddy`.
8. **Smoke test.** Open `photos.ops.eblu.me` in a browser. Sign in.
Scroll the timeline. Open an album. Trigger an ML search.
9. **Update borgmatic.** If the Tailscale hostname for pg changed,
update `borgmatic.cfg` on indri to point at the ringtail
`immich-pg-tailscale` service. Run a manual backup to verify.
## After cutover
- `argocd app set immich --revision <branch>` is no longer relevant;
the minikube `immich` app gets deleted entirely.
- Delete `argocd/apps/immich.yaml`, `argocd/manifests/immich/`, and
the minikube `argocd/manifests/databases/immich-pg.yaml` +
`external-secret-immich-borgmatic.yaml` +
`service-immich-pg-tailscale.yaml`.
- Rename `immich-ringtail` back to `immich` (the `-ringtail` suffix
was scaffolding for the dual-cluster window; once minikube is
empty of immich, the unsuffixed name is clean).
- Confirm the minikube `immich-pg` PVC is no longer used, then
delete it (the PV with `Retain` policy will persist — clean that
up too).
## Verification (definition of done)
- `photos.ops.eblu.me` works for a real session, including ML search.
- Source minikube has no `immich` pods, no `immich-pg`, no PVCs.
- Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check
`docker stats minikube` on indri.
- Nightly borgmatic run after the cutover completes successfully,
with the immich-pg archive showing the new source.
## Rollback (within the cutover window)
If smoke test fails: flip Caddy back, scale ringtail immich to 0,
scale source immich back up. Source pg was never destroyed. File a
plan reset on the relevant prerequisite card and try again next
session.
## Out of scope
- Decommissioning all of minikube. This chain just removes immich.
Other tenants migrate in their own chains as part of the broader
indri-k8s decommission. See [[migrate-immich-to-ringtail]] for
context.

View file

@ -0,0 +1,79 @@
---
title: Immich Postgres Data Migration
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- postgres
- immich
- critical
---
# Immich Postgres Data Migration
**This is the data-loss surface of the migration.** Pick a method,
prove it on a throwaway copy first, then run the real cutover.
## Decision: pick one
### Option A — CNPG `externalCluster` bootstrap (preferred)
Stand the ringtail cluster up as a streaming replica of the minikube
cluster via `bootstrap.pg_basebackup.source`. Replica catches up
online; when ready, promote it and point Immich at it. This is
CNPG's documented PG-to-PG migration path and gives near-zero data
loss (the WAL position at promote == the position at app stop).
Requires: network path from ringtail to minikube's pg over the
tailnet (the existing `immich-pg-tailscale` Service works), and a
superuser secret minikube-side exposed to ringtail's basebackup.
Pitfall to plan around: the ringtail Cluster CR will need its
`bootstrap` block rewritten *after* promotion (CNPG doesn't
gracefully drop the externalCluster reference). Account for this in
[[immich-pg-on-ringtail]] — it may force a reset of that card.
### Option B — pg_dump / pg_restore
Stop immich, `pg_dump -Fc` from minikube, scp to ringtail, restore.
Simpler but full downtime for the whole dump+restore window
(measure on a copy first — VectorChord indexes are slow to rebuild).
Smaller blast radius; no streaming-replication moving parts.
Use this if Option A hits any blocker. Data loss should still be
zero if the source is stopped first.
### Option C — leave pg on minikube
Rejected. See goal card [[migrate-immich-to-ringtail#Why postgres on
ringtail (not cross-cluster)]].
## Dry run before real cutover
Whichever option wins:
1. Snapshot the minikube `immich-pg` PVC or take a fresh `pg_dump`
into a scratch location.
2. Restore into a *separate* ringtail CNPG cluster (different name,
e.g. `immich-pg-test`) and point a scratch immich-server pod at
it.
3. Verify: pod boots, can list assets, ML embeddings query without
error, face thumbnails render. VectorChord-backed queries should
not error.
4. Tear the scratch cluster down before doing the real one.
## Verification on the real run
- Row counts match for `assets`, `albums`, `users`, `face`,
`asset_face`, `smart_search` (the embedding table) — script this.
- `pg_dump --schema-only --no-owner` diff between source and dest
should be empty modulo CNPG-managed roles.
- Immich `/api/server-info/version` and `/api/server-info/statistics`
return sane numbers.
## Rollback
If the cutover fails verification: stop the ringtail immich, repoint
ArgoCD `immich.destination` back to minikube, re-sync. Source pg was
never deleted. Document what failed and reset the chain.

View file

@ -0,0 +1,69 @@
---
title: Immich Postgres Cluster on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- postgres
- immich
---
# Immich Postgres Cluster on Ringtail
Stand up a fresh `immich-pg` CNPG Cluster on ringtail, ready to receive
data. **No data import yet** — that's [[immich-pg-data-migration]].
## What to do
- Create `argocd/manifests/databases-ringtail/` (or pick another
namespace name — verify what other ringtail pg clusters will use;
if none yet, `databases` is fine).
- Port these from the minikube side:
- `immich-pg.yaml` — CNPG Cluster CR. Same image
(`ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0`), same
extensions, same managed `borgmatic` role. Bump `storage.size` if
the minikube 10 GiB looks tight (check actual usage first).
`storageClass: local-path` on ringtail (default).
- `external-secret-immich-borgmatic.yaml` — same 1Password item,
same field, but referencing the ringtail `ClusterSecretStore`
(`onepassword-blumeops` already exists per the
`external-secrets-ringtail` app).
- Service for in-cluster access (the operator creates `immich-pg-rw`
etc. automatically; verify the app deployment uses those names).
- A Tailscale Service if we want backups to keep working via the
same hostname during the transition — see "Borgmatic" below.
- New ArgoCD app `argocd/apps/databases-ringtail.yaml` pointing at
the new path, destination ringtail.
## Verification
- Cluster reaches `Ready`.
- `borgmatic` role exists, `rolcanlogin=t`, and is a member of
`pg_read_all_data` (via `managed.roles[].inRoles`).
- ExternalSecret `immich-pg-borgmatic` syncs from 1Password
(`Ready: True`) and the rendered Secret has `username=borgmatic`.
- The `vchord`, `vector`, `cube`, `earthdistance` extensions show
installed in the `postgres` database (`\dx` from
`psql -U postgres`). They are NOT installed in the `immich`
database at this point — `postInitSQL` in CNPG's `initdb` block
runs against the `postgres` superuser database. The Immich app
itself creates the extensions in its own `immich` database at
startup; do not be alarmed by their absence pre-immich-deploy.
The `vchord.so` library is preloaded via
`shared_preload_libraries` regardless, so `CREATE EXTENSION` at
app startup just registers it in the right database.
## Borgmatic implications
`borgmatic.cfg` on indri targets `immich-pg-tailscale` over the
tailnet. During migration both clusters will exist briefly. Decide
upfront: backup the *source* pg until cutover, then flip borgmatic
to the ringtail Tailscale service. Document the flip in
[[immich-cutover-and-decommission]].
## Out of scope
- Importing data. That is [[immich-pg-data-migration]], which may
drive a reset on this card if the migration approach (e.g. CNPG
`externalCluster` bootstrap) requires changes to this Cluster CR.

View file

@ -0,0 +1,132 @@
---
title: Migrate Immich to Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- immich
- migration
---
# Migrate Immich to Ringtail
Move the entire Immich stack (server, ML, valkey, postgres) off
`minikube-indri` and onto `k3s-ringtail`. This is the first concrete
chain in the broader indri-k8s decommission: minikube is
memory-saturated (97% RAM, swapping), and Immich is the single
largest tenant (~1.5 GiB resident).
## End state
- Immich `server`, `machine-learning`, and `valkey` Deployments run on
ringtail k3s in the `immich` namespace.
- The `immich-machine-learning` pod uses ringtail's RTX 4080 via the
`nvidia-device-plugin` (performance win — currently CPU-only on
minikube).
- A CNPG `immich-pg` Cluster (PostgreSQL 17 + VectorChord) runs in a
`databases` namespace on ringtail, owned by the `cnpg-system`
operator on ringtail.
- The photo library still lives on [[sifaka]] at `/volume1/photos`,
mounted via NFS from ringtail pods (RWX).
- Routing: `photos.ops.eblu.me` (Caddy on indri) proxies to a
Tailscale ProxyGroup ingress on ringtail. No public surface today.
- The ArgoCD `immich` app's `destination.server` points at
`https://ringtail.tail8d86e.ts.net:6443`. The old minikube
manifests are removed.
## Non-goals
- Public exposure via Fly. Immich stays tailnet-only.
- Changing the immich version or runtime configuration. This is a
lift-and-shift; bumps come later.
- Backing up to a different target. [[borgmatic]] keeps running on
indri (it pulls via Tailscale and uses sifaka SMB for the library).
## Critical constraint: no data loss
Downtime is acceptable (Immich is a single-user system; we can take
it offline for the cutover). **Data loss is not.** Two surfaces matter:
1. **Postgres** — face data, ML embeddings (vectors), album state,
sharing, etc. Re-derivable in theory; weeks of recompute in
practice. See [[immich-pg-data-migration]].
2. **Library files**`/volume1/photos`. Not moving, but the NFS
path must be verified accessible from ringtail before cutover.
See [[sifaka-nfs-from-ringtail]].
[[borgmatic]] backs both up to sifaka + BorgBase nightly; restore is
possible but slow. Treat it as a fallback, not a plan.
## Why postgres on ringtail (not cross-cluster)
`immich-pg` already has a Tailscale Service we could point ringtail
at, leaving the DB on minikube. We're not doing that because:
- The whole goal is to retire minikube — keeping pg there blocks it.
- Immich is chatty against pg; tailnet round-trips would hurt.
- CNPG is the same operator on both sides — a Cluster CR on ringtail
is mechanically equivalent.
## Approach
This is a C2 Mikado chain. The prerequisite cards each represent a
distinct surface that has to work before cutover. See
[[agent-change-process#C2 — Mikado Chain]] for the discipline.
## Workflow note: registering new ArgoCD apps during the chain
This chain adds three new ArgoCD `Application` definitions in
`argocd/apps/`: `cloudnative-pg-ringtail`, `databases-ringtail`,
and (later) `immich-ringtail`. The usual C1/C2 pattern of
`argocd app set <app> --revision <branch> && argocd app sync <app>`
does NOT work for the app-of-apps `apps` Application itself, because
`apps` self-manages: it re-reads `apps.yaml` (which declares
`targetRevision: main`) on every sync and reverts the override. As a
result, new app definitions added on a feature branch are never
visible to the cluster via `apps`.
**Use `kubectl apply` to register each new Application directly:**
```fish
kubectl --context=minikube-indri apply -f argocd/apps/<new-app>.yaml
```
This creates the Application resource out-of-band, bypassing `apps`.
For apps whose source lives in **this** repo (e.g.
`databases-ringtail`, `immich-ringtail` — manifest paths exist only
on the branch until merge), follow the apply with a branch override:
```fish
argocd app set <new-app> --revision mikado/migrate-immich-to-ringtail
argocd app sync <new-app>
```
For apps whose source is an **external** repo at a pinned tag (e.g.
`cloudnative-pg-ringtail``mirrors/cloudnative-pg` `v1.27.1`), no
override is needed — the source revision is independent of this PR.
After PR merge:
```fish
argocd app set <new-app> --revision main
argocd app sync <new-app>
```
`apps` itself, on its next sync from `main`, will discover the new
Application definitions in `argocd/apps/` and adopt the already-running
resources without disruption — provided their in-cluster spec matches
the on-disk definitions (which it does because we applied the same
file).
## Related
- [[shower-on-ringtail]] — a previous migration to ringtail (simpler:
no upstream cluster, SQLite, no GPU)
- [[connect-to-postgres]] — getting a psql session against CNPG
- [[ringtail]] — the target cluster
- [[cnpg-on-ringtail]], [[immich-pg-on-ringtail]],
[[immich-pg-data-migration]], [[sifaka-nfs-from-ringtail]],
[[immich-app-on-ringtail]], [[immich-cutover-and-decommission]] —
the prerequisite cards

View file

@ -0,0 +1,67 @@
---
title: Sifaka NFS Photos from Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
tags:
- how-to
- operations
- storage
- nfs
- sifaka
---
# Sifaka NFS Photos from Ringtail
The Immich library lives at `sifaka:/volume1/photos` and is mounted
into the pod via an NFS PV (see `argocd/manifests/immich/pv-nfs.yaml`).
That PV is currently scoped to indri. We need ringtail to mount the
same path with the same RWX semantics, without breaking the existing
indri mount during the transition.
## What to verify / do
- Check `sifaka` DSM NFS rules for the `photos` share. Per
[[shower-on-ringtail#NFS + SMB share on sifaka]] convention, rules
use `192.168.1.0/24` + `100.64.0.0/10` with
`all_squash`/`Map all users to admin`. The existing rule may
already cover ringtail (it's on `192.168.1.21` per the recent
static-IP pin). If so this card is a verification card.
- If the rule is locked to indri's IP: add an entry for ringtail
(192.168.1.21) or widen to the subnet pattern above.
- Test mount from a ringtail debug pod (busybox or alpine with
nfs-utils) against the `photos` share. Read a file. Write a temp
file. Delete it.
- Watch for the known sifaka NFS-over-Tailscale gotcha: sifaka's
Tailscale must be in TUN mode (not userspace) for NFS to work
reliably over the tailnet. The NFS path here goes over the LAN
(not tailnet), so this shouldn't bite, but worth confirming the
NFS traffic is on `192.168.1.x` not `100.x`.
## PV + PVC on ringtail
- New `pv-nfs.yaml` mirroring the minikube one (name can be shared
if the PV is cluster-scoped — but PVs are per-cluster, so just
duplicate). Same `server: sifaka`, same path, same
`accessModes: [ReadWriteMany]`, `persistentVolumeReclaimPolicy:
Retain`.
- New `pvc.yaml` in the ringtail `immich` namespace bound to it.
- The minikube PVC stays bound and active until cutover — both
clusters can have the share NFS-mounted simultaneously (NFS RWX
permits this). Immich itself must not be running on both sides
at once.
## Verification
- A pod on ringtail can `ls /mnt/photos/` and see the same files
as the indri pod.
- File written from ringtail pod is visible from indri pod and
vice versa (proves there's no caching surprise).
## Out of scope
- Migrating photo files. Nothing moves; this is just adding a second
NFS client.
- The `pvc-ml-cache.yaml` PVC (a separate ML model cache). That's
not on NFS — it's a regular PVC. Recreated empty on ringtail in
[[immich-app-on-ringtail]]; the first ML pod boot will repopulate
it.