Add Kubernetes migration plan documentation #24

Merged
eblume merged 21 commits from feature/k8s-migration-plan into main 2026-01-17 17:34:54 -08:00

21 commits

Author SHA1 Message Date
bcd96d86f0 Phase 0 review fixes
- Bump podman disk-size to 220G (> minikube's 200G)
- Fix Step 0.3 test to use curl instead of podman (not installed yet)
- Simplify Step 0.5 zot metrics to just zot_up for now
- Add Backup Strategy section to Technical Decisions
- Add zot restart handler to Step 0.3
- Move dashboard steps to Phase 0 Follow-up section
- Renumber steps (0.14->0.12, 0.15->0.13)
- Fix Modified Files Summary (tag:k8s deferred to Phase 1)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 17:32:55 -08:00
a31e8935c9 Remove Step 0.16 (NFS on Sifaka)
Nothing in Phase 0 requires NFS, and it's per-share config anyway.
Will add when actually needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 17:11:34 -08:00
5f35084176 Add Step 0.16: Enable NFS on Sifaka, bump minikube disk to 200g
- Add manual step for enabling NFS on Synology DSM
- Document NFS permissions config for k8s-volumes share
- Include verification commands for testing NFS mount
- Bump minikube disk-size from 100g to 200g
- Add note explaining storage options (hostPath, NFS, SMB)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 17:09:49 -08:00
2c8ced07b4 Tighten podman ansible tasks based on manual testing
- Use 'started successfully' instead of just 'started' for changed_when
- Use specific failed_when: rc not in [0, 125] instead of false
- 125 = already exists (init) or already running (start)

Tested manually on indri - podman machine initialized and running.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 16:58:35 -08:00
b333b7ff2c Move plan to plans/ directory, add completion step
- Rename docs/k8s-migration.md to plans/k8s-migration.md
- Create plans/completed/ for finished plans
- Add Plan Completion section with instructions to archive when done

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 16:47:29 -08:00
b703abe4d1 Remove manual alloy restart from Step 0.6
Ansible handler restarts alloy automatically when config changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 16:37:50 -08:00
bf1664d117 Move tailscale URL tests from Step 0.3 to Step 0.4
registry.tail8d86e.ts.net isn't available until tailscale serve
is configured in Step 0.4. Keep localhost tests in Step 0.3.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 16:31:35 -08:00
cff951a0f9 Add quay.io to zot sync config and namespace convention
Config template and namespace docs now match defaults/main.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 16:11:43 -08:00
9edecf78dd Defer tag:k8s to Phase 1, clarify kubeconfig setup
- Remove tag:k8s from Phase 0 Step 0.1 (not needed until Tailscale
  Kubernetes Operator is deployed)
- Add tag:k8s ACL setup as new Step 1 in Phase 1
- Clarify Step 0.10: no special Tailscale service needed for K8s API
  (admin wildcard grant covers it)
- Add sed commands to replace localhost with indri in kubeconfig

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:29:09 -08:00
546fe08d9c Fix Zot paths in Technical Decisions section
Update to match Phase 0 details:
- Built from source, not homebrew
- Config at ~/.config/zot/config.json
- Data at ~/zot/
- Binary path documented

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:23:22 -08:00
6df7153766 Update Step 0.3 with verified zot build process
- Use localhost:3001 for forge clone (hairpinning limitation)
- Document mise go@1.25 setup in repo directory
- Correct build command: mise x -- make binary
- Mark prerequisites as already completed with verification

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:19:29 -08:00
c9d7acfafe Fix example: mcquack is not a mirror, use devpi instead
mcquack is Erich's own project, not a third-party mirror.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:01:36 -08:00
6d84ff7bca Use forge mirror for zot, add third-party project guidance
- Updated Step 0.3 to clone zot from forge mirror instead of GitHub
- Added "Third-Party Projects" section to CLAUDE.md explaining:
  - Ask user to mirror 3rd party repos to forge first
  - Clone from mirror to ~/code/3rd/
  - Avoids external dependencies

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:00:54 -08:00
f064ba3afa Update Zot installation: clone to ~/code/3rd/ and build from source
Zot isn't in homebrew. Following existing pattern (like kiwix-tools),
clone to ~/code/3rd/zot on indri and build with 'make binary'.
Updated defaults and LaunchAgent template to use built binary path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:56:17 -08:00
adba123ad4 Document both registry modes: pull-through cache + private images
- Added Zot config.json template showing sync extension for pull-through
- Documented namespace convention:
  - registry.../docker.io/* → cached from Docker Hub
  - registry.../ghcr.io/* → cached from GHCR
  - registry.../blumeops/* → private images
- Added testing steps for both pull-through and private push
- Updated zk template with namespace table and build/push commands
- Updated verification checklist

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:51:46 -08:00
950604bf25 Add tag:k8s grant for registry access (Woodpecker CI)
K8s workloads (like Woodpecker CI) need to push/pull images from Zot.
They'll get Tailscale identity via the operator (Phase 1) with tag:k8s.
Added grant and test case for tag:k8s → tag:registry access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:47:24 -08:00
97dce31171 Remove member grant for registry - admins only
Registry access restricted to admins (who already have full access).
Members don't need to push/pull container images.
K8s accesses registry locally on indri, not via Tailscale.
Added note about Zot htpasswd auth for future reference.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:45:03 -08:00
ee42f0f1a2 Fix Step 0.1: Use correct policy.hujson structure
- Use 'grants' not 'acls' (that's the newer format)
- Show exact line numbers and locations for each change
- Include tagOwners, grants, and tests sections
- Follow existing pattern with tag:blumeops in tagOwners

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:42:19 -08:00
26113aee42 Remove Brewfile from Phase 0 (it's for gilbert tooling only)
Brewfile is for development tooling on gilbert, not for indri services.
Ansible roles handle homebrew installations on indri directly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:33:09 -08:00
ace4822305 Expand Phase 0 with detailed implementation steps
- Add 16 numbered steps with specific files, code, and testing commands
- Add Tailscale service creation order warning (must create in admin
  console BEFORE running tailscale serve)
- Add comprehensive verification checklist and rollback procedures
- Document indri-services-check updates for zot and minikube
- Include zk documentation templates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:27:04 -08:00
4d916a46d3 Add Kubernetes migration plan documentation
Comprehensive phased plan for migrating blumeops services from direct
hosting on indri to a minikube cluster. Documents technical decisions
(Zot registry, Podman driver, CloudNativePG, Tailscale Operator) and
9 migration phases with verification and rollback procedures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:12:09 -08:00