K8s Migration Phase 1: Infrastructure Setup #29

Merged
eblume merged 20 commits from feature/k8s-phase1-kickoff into main 2026-01-19 09:49:53 -08:00

20 commits

Author SHA1 Message Date
92306c7953 Add ArgoCD and k8s-pg to service health checks
- Check ArgoCD healthz endpoint
- Check k8s-pg PostgreSQL via pg_isready
- Verify all ArgoCD apps are synced

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:45:11 -08:00
739f2f7da5 Rename ProxyClass to crio-compat with documentation
- Rename from generic "default" to descriptive "crio-compat"
- Add detailed comments explaining why this ProxyClass exists
- Update all Service/Ingress annotations to use new name
- Remove invalid `default: true` field (not a real ProxyClass field)

The ProxyClass exists because CRI-O cannot resolve short image names.
Each Tailscale Service/Ingress needs the annotation to use it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:40:11 -08:00
9cd126243c Make ProxyClass default for all Tailscale proxies
- Set `default: true` on ProxyClass so all Services/Ingresses use it
- Remove explicit proxy-class annotation from databases service
- Fixes CRI-O short image name issue globally for Tailscale resources

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:23:29 -08:00
ecf2aeb4e8 Fix PostgreSQL Tailscale service ProxyClass
- Add proxy-class annotation to use default ProxyClass
- Fixes CRI-O image name resolution issue

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:16:39 -08:00
9854b4dbee Add Tailscale LoadBalancer for PostgreSQL testing
- Expose k8s-pg.tail8d86e.ts.net for testing during migration
- Temporary service until Phase 4 when pg.tail8d86e.ts.net switches
- Update README with connection info and cleanup notes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:14:34 -08:00
d75fdfdad6 Add PostgreSQL cluster manifest for Step 7
- Create blumeops-pg Cluster with CloudNativePG
- Add eblume superuser role (matches current brew pg setup)
- Configure pg_hba for password auth from any IP (Tailscale handles security)
- Add secret template for eblume password from 1Password
- Create ArgoCD Application with manual sync policy
- Update Phase 1 plan with implementation notes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 08:55:08 -08:00
1c172702ec Add CRI-O registry mirror config for zot pull-through cache
Configures minikube's CRI-O to use zot on indri as a pull-through cache
for docker.io, ghcr.io, and quay.io. Uses host.containers.internal:5050
which is stable across restarts.

This reduces external bandwidth, speeds up pulls, and provides resilience
if upstream registries are unreachable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 08:37:01 -08:00
a9a667cd81 Enable ServerSideApply for CloudNativePG
Required to handle large CRDs that exceed the kubectl annotation size limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 08:11:20 -08:00
1bdfca0f22 Add CloudNativePG operator via ArgoCD (Phase 1 Step 6)
- Add CloudNativePG Application using multi-source Helm pattern
- Helm chart from upstream cloudnative-pg repo
- Values file from our git repo for customization
- Manual sync policy consistent with other workload apps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 08:06:33 -08:00
32d5927838 Add ArgoCD self-management and app-of-apps pattern
- Add argocd CLI to Brewfile
- Create argocd.yaml for ArgoCD self-management (manual sync)
- Create apps.yaml app-of-apps root (auto-sync for Application resources)
- Convert tailscale-operator to kustomize
- Update READMEs with bootstrap steps and ArgoCD CLI commands
- Change all workload Applications to manual sync policy

App-of-apps auto-syncs to pick up new Application manifests, but child
apps require manual sync for actual workload deployments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 07:54:04 -08:00
c47ac189c9 Migrate Tailscale operator to ArgoCD management (Phase 1 Step 5)
Adds ArgoCD Application to manage Tailscale operator from forge:
- ArgoCD Application sourced from internal Forgejo via SSH
- DNS config for cluster-to-tailnet name resolution
- Egress proxy for accessing forge on indri
- ACL grants for k8s workloads to reach forge (ports 3001, 2200)
- Template for repository secret with 1Password SSH key reference

Key discovery: 1Password op read requires ?ssh-format=openssh parameter
to get keys in OpenSSH format that ArgoCD's SSH client can read.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 07:12:51 -08:00
d510374432 Fix ArgoCD TLS: use Ingress with Let's Encrypt
- Switch from LoadBalancer to Ingress for automatic TLS certs
- Add ConfigMap patch to disable internal HTTPS redirect
- Tailscale Ingress provides Let's Encrypt certificates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:00:27 -08:00
fc72021d6b Add ArgoCD manifests with Tailscale exposure
- Uses kustomize with remote base from upstream ArgoCD
- Adds Tailscale LoadBalancer service for external access
- Exposed at https://argocd.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:41:59 -08:00
950a3a6cc3 Add ProxyClass for CRI-O image compatibility
CRI-O cannot resolve short image names like 'tailscale/tailscale:stable'.
The ProxyClass 'default' sets fully-qualified image references.

Services must use annotation: tailscale.com/proxy-class: "default"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 18:50:27 -08:00
92de2c3909 Exclude specific tailscale-operator file from yamllint
Changed from wildcard pattern to specific file path per review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 18:35:20 -08:00
e017117449 Add Tailscale operator manifests (Phase 1 Step 3)
Added:
- argocd/manifests/tailscale-operator/operator.yaml - from Tailscale repo
  - Removed embedded secret (managed separately)
  - Changed image to docker.io/tailscale/k8s-operator:stable for CRI-O
- argocd/manifests/tailscale-operator/secret.yaml.tpl - 1Password template
- argocd/manifests/tailscale-operator/README.md - deployment instructions
- .yamllint.yaml - exclude third-party operator.yaml files

OAuth client requires tag:k8s-operator on Devices write scope.
The operator assigns tag:k8s to resources it creates via ACL ownership.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 18:33:29 -08:00
91cd7260fd Expand Phase 1 plan with ArgoCD and GitOps pattern
Major updates to Phase 1:
- Added ArgoCD deployment as step 4 (exposed at argocd.tail8d86e.ts.net)
- Bootstrap pattern: Tailscale operator deployed first via kubectl,
  then ArgoCD takes over management of all components
- App-of-apps pattern with argocd/apps/ and argocd/manifests/ structure
- PostgreSQL migration strategy documented (zero-downtime switchover)
- Using GitHub mirror for ArgoCD git source (public, no auth needed)

New Phase 1 steps:
1. Update Pulumi ACLs ✓
2. Create Tailscale OAuth client ✓
3. Deploy Tailscale operator (bootstrap)
4. Deploy ArgoCD
5. Migrate Tailscale operator to ArgoCD
6. Deploy CloudNativePG via ArgoCD
7. Create PostgreSQL cluster via ArgoCD
8. Create app-of-apps root

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 16:05:46 -08:00
fc54b9ad66 Add tag:k8s-operator and update Phase 1 plan
ACL changes:
- Added tag:k8s-operator for the Tailscale K8s Operator
- Made tag:k8s-operator an owner of tag:k8s so the operator can
  assign that tag to resources it creates

Phase 1 plan updates:
- Added Kubernetes Tags Overview section explaining all three tags
- Expanded OAuth client creation instructions
- Added 1Password storage instructions
- Added verification and rollback sections

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 15:35:58 -08:00
4ef8e24ec6 Add tag:k8s for Kubernetes workloads (Phase 1 Step 1)
Added:
- tag:k8s to tagOwners for k8s workload management
- Grant for tag:k8s -> tag:registry access (for CI pushing images)
- ACL test case for k8s registry access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 15:24:28 -08:00
0db4abe64d Split k8s migration plan into phases folder
Reorganized the monolithic migration plan into separate files:
- 00_overview.md: Architecture, technical decisions, shared info
- P0_foundation.complete.md: Phase 0 (complete)
- P1_k8s_infrastructure.md: Phase 1 (in progress)
- P2-P9: Remaining phases (pending)

This makes the plan easier to navigate and track progress.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 15:21:35 -08:00