K8s Migration Phase 1: Infrastructure Setup #29

Merged
eblume merged 20 commits from feature/k8s-phase1-kickoff into main 2026-01-19 09:49:53 -08:00
Owner

Summary

  • Split k8s migration plan into phases folder for easier navigation
  • Added tag:k8s to Pulumi ACLs for Kubernetes workloads
  • Phase 1 work in progress

Phase 1 Goals

  • Tailscale Kubernetes Operator
  • CloudNativePG Operator
  • PostgreSQL cluster for future app migrations

Deployment and Testing

  • Review Phase 1 plan
  • mise run tailnet-preview to verify ACL changes
  • mise run tailnet-up to apply ACL changes
  • Create Tailscale OAuth client (manual)
  • Deploy operators and PostgreSQL cluster

🤖 Generated with Claude Code

## Summary - Split k8s migration plan into phases folder for easier navigation - Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads - Phase 1 work in progress ## Phase 1 Goals - Tailscale Kubernetes Operator - CloudNativePG Operator - PostgreSQL cluster for future app migrations ## Deployment and Testing - [x] Review Phase 1 plan - [x] `mise run tailnet-preview` to verify ACL changes - [x] `mise run tailnet-up` to apply ACL changes - [x] Create Tailscale OAuth client (manual) - [x] Deploy operators and PostgreSQL cluster 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reorganized the monolithic migration plan into separate files:
- 00_overview.md: Architecture, technical decisions, shared info
- P0_foundation.complete.md: Phase 0 (complete)
- P1_k8s_infrastructure.md: Phase 1 (in progress)
- P2-P9: Remaining phases (pending)

This makes the plan easier to navigate and track progress.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added:
- tag:k8s to tagOwners for k8s workload management
- Grant for tag:k8s -> tag:registry access (for CI pushing images)
- ACL test case for k8s registry access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ACL changes:
- Added tag:k8s-operator for the Tailscale K8s Operator
- Made tag:k8s-operator an owner of tag:k8s so the operator can
  assign that tag to resources it creates

Phase 1 plan updates:
- Added Kubernetes Tags Overview section explaining all three tags
- Expanded OAuth client creation instructions
- Added 1Password storage instructions
- Added verification and rollback sections

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major updates to Phase 1:
- Added ArgoCD deployment as step 4 (exposed at argocd.tail8d86e.ts.net)
- Bootstrap pattern: Tailscale operator deployed first via kubectl,
  then ArgoCD takes over management of all components
- App-of-apps pattern with argocd/apps/ and argocd/manifests/ structure
- PostgreSQL migration strategy documented (zero-downtime switchover)
- Using GitHub mirror for ArgoCD git source (public, no auth needed)

New Phase 1 steps:
1. Update Pulumi ACLs ✓
2. Create Tailscale OAuth client ✓
3. Deploy Tailscale operator (bootstrap)
4. Deploy ArgoCD
5. Migrate Tailscale operator to ArgoCD
6. Deploy CloudNativePG via ArgoCD
7. Create PostgreSQL cluster via ArgoCD
8. Create app-of-apps root

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added:
- argocd/manifests/tailscale-operator/operator.yaml - from Tailscale repo
  - Removed embedded secret (managed separately)
  - Changed image to docker.io/tailscale/k8s-operator:stable for CRI-O
- argocd/manifests/tailscale-operator/secret.yaml.tpl - 1Password template
- argocd/manifests/tailscale-operator/README.md - deployment instructions
- .yamllint.yaml - exclude third-party operator.yaml files

OAuth client requires tag:k8s-operator on Devices write scope.
The operator assigns tag:k8s to resources it creates via ACL ownership.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed from wildcard pattern to specific file path per review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CRI-O cannot resolve short image names like 'tailscale/tailscale:stable'.
The ProxyClass 'default' sets fully-qualified image references.

Services must use annotation: tailscale.com/proxy-class: "default"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Uses kustomize with remote base from upstream ArgoCD
- Adds Tailscale LoadBalancer service for external access
- Exposed at https://argocd.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch from LoadBalancer to Ingress for automatic TLS certs
- Add ConfigMap patch to disable internal HTTPS redirect
- Tailscale Ingress provides Let's Encrypt certificates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds ArgoCD Application to manage Tailscale operator from forge:
- ArgoCD Application sourced from internal Forgejo via SSH
- DNS config for cluster-to-tailnet name resolution
- Egress proxy for accessing forge on indri
- ACL grants for k8s workloads to reach forge (ports 3001, 2200)
- Template for repository secret with 1Password SSH key reference

Key discovery: 1Password op read requires ?ssh-format=openssh parameter
to get keys in OpenSSH format that ArgoCD's SSH client can read.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add argocd CLI to Brewfile
- Create argocd.yaml for ArgoCD self-management (manual sync)
- Create apps.yaml app-of-apps root (auto-sync for Application resources)
- Convert tailscale-operator to kustomize
- Update READMEs with bootstrap steps and ArgoCD CLI commands
- Change all workload Applications to manual sync policy

App-of-apps auto-syncs to pick up new Application manifests, but child
apps require manual sync for actual workload deployments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CloudNativePG Application using multi-source Helm pattern
- Helm chart from upstream cloudnative-pg repo
- Values file from our git repo for customization
- Manual sync policy consistent with other workload apps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required to handle large CRDs that exceed the kubectl annotation size limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configures minikube's CRI-O to use zot on indri as a pull-through cache
for docker.io, ghcr.io, and quay.io. Uses host.containers.internal:5050
which is stable across restarts.

This reduces external bandwidth, speeds up pulls, and provides resilience
if upstream registries are unreachable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create blumeops-pg Cluster with CloudNativePG
- Add eblume superuser role (matches current brew pg setup)
- Configure pg_hba for password auth from any IP (Tailscale handles security)
- Add secret template for eblume password from 1Password
- Create ArgoCD Application with manual sync policy
- Update Phase 1 plan with implementation notes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Expose k8s-pg.tail8d86e.ts.net for testing during migration
- Temporary service until Phase 4 when pg.tail8d86e.ts.net switches
- Update README with connection info and cleanup notes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add proxy-class annotation to use default ProxyClass
- Fixes CRI-O image name resolution issue

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set `default: true` on ProxyClass so all Services/Ingresses use it
- Remove explicit proxy-class annotation from databases service
- Fixes CRI-O short image name issue globally for Tailscale resources

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename from generic "default" to descriptive "crio-compat"
- Add detailed comments explaining why this ProxyClass exists
- Update all Service/Ingress annotations to use new name
- Remove invalid `default: true` field (not a real ProxyClass field)

The ProxyClass exists because CRI-O cannot resolve short image names.
Each Tailscale Service/Ingress needs the annotation to use it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Check ArgoCD healthz endpoint
- Check k8s-pg PostgreSQL via pg_isready
- Verify all ArgoCD apps are synced

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
eblume merged commit a8f4d00294 into main 2026-01-19 09:49:53 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!29
No description provided.