From 91cd7260fd07b7825376930f7ef10e26d2ad7e38 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sun, 18 Jan 2026 16:05:46 -0800 Subject: [PATCH] Expand Phase 1 plan with ArgoCD and GitOps pattern MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major updates to Phase 1: - Added ArgoCD deployment as step 4 (exposed at argocd.tail8d86e.ts.net) - Bootstrap pattern: Tailscale operator deployed first via kubectl, then ArgoCD takes over management of all components - App-of-apps pattern with argocd/apps/ and argocd/manifests/ structure - PostgreSQL migration strategy documented (zero-downtime switchover) - Using GitHub mirror for ArgoCD git source (public, no auth needed) New Phase 1 steps: 1. Update Pulumi ACLs ✓ 2. Create Tailscale OAuth client ✓ 3. Deploy Tailscale operator (bootstrap) 4. Deploy ArgoCD 5. Migrate Tailscale operator to ArgoCD 6. Deploy CloudNativePG via ArgoCD 7. Create PostgreSQL cluster via ArgoCD 8. Create app-of-apps root Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration/00_overview.md | 16 +- plans/k8s-migration/P1_k8s_infrastructure.md | 491 +++++++++++++++---- 2 files changed, 393 insertions(+), 114 deletions(-) diff --git a/plans/k8s-migration/00_overview.md b/plans/k8s-migration/00_overview.md index 16c3c1b..514206b 100644 --- a/plans/k8s-migration/00_overview.md +++ b/plans/k8s-migration/00_overview.md @@ -7,14 +7,14 @@ This plan details a phased migration of blumeops services from direct hosting on | Phase | Name | Status | Description | |-------|------|--------|-------------| | 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster | -| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | In Progress | Tailscale operator + CloudNativePG | -| 2 | [Grafana](P2_grafana.md) | Pending | Migrate Grafana (pilot) | -| 3 | [PostgreSQL](P3_postgresql.md) | Pending | Migrate to CloudNativePG | -| 4 | [Miniflux](P4_miniflux.md) | Pending | Migrate Miniflux | -| 5 | [devpi](P5_devpi.md) | Pending | Migrate devpi | -| 6 | [Kiwix](P6_kiwix.md) | Pending | Migrate Kiwix | -| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) | -| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD | +| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | In Progress | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster | +| 2 | [Grafana](P2_grafana.md) | Pending | Migrate Grafana (pilot) via ArgoCD | +| 3 | [PostgreSQL](P3_postgresql.md) | Pending | Data migration to k8s PostgreSQL | +| 4 | [Miniflux](P4_miniflux.md) | Pending | Migrate Miniflux via ArgoCD | +| 5 | [devpi](P5_devpi.md) | Pending | Migrate devpi via ArgoCD | +| 6 | [Kiwix](P6_kiwix.md) | Pending | Migrate Kiwix via ArgoCD | +| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) via ArgoCD | +| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD via ArgoCD | | 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services | ## Architecture Overview diff --git a/plans/k8s-migration/P1_k8s_infrastructure.md b/plans/k8s-migration/P1_k8s_infrastructure.md index 2066a1c..f05a7fa 100644 --- a/plans/k8s-migration/P1_k8s_infrastructure.md +++ b/plans/k8s-migration/P1_k8s_infrastructure.md @@ -1,6 +1,6 @@ # Phase 1: Kubernetes Infrastructure -**Goal**: Tailscale operator + CloudNativePG operator +**Goal**: Tailscale operator, ArgoCD, CloudNativePG operator, PostgreSQL cluster **Status**: In Progress @@ -8,9 +8,22 @@ --- -## Kubernetes Tags Overview +## Overview -Phase 1 introduces three Tailscale tags for Kubernetes: +Phase 1 establishes the k8s control plane infrastructure: +1. **Tailscale operator** - Exposes services on the tailnet +2. **ArgoCD** - GitOps continuous delivery +3. **CloudNativePG** - PostgreSQL operator +4. **PostgreSQL cluster** - Database for future app migrations + +The deployment follows a bootstrap pattern: +- First two components deployed via `kubectl apply -k` (no GitOps yet) +- ArgoCD then takes over management of all components including itself +- All subsequent deployments use ArgoCD + +--- + +## Kubernetes Tags Overview | Tag | Purpose | Applied To | |-----|---------|------------| @@ -22,118 +35,278 @@ Phase 1 introduces three Tailscale tags for Kubernetes: --- +## PostgreSQL Migration Strategy + +The k8s PostgreSQL cluster will eventually replace the brew PostgreSQL on indri. + +| Phase | `pg.tail8d86e.ts.net` points to | Miniflux connects to | +|-------|--------------------------------|---------------------| +| Current | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` | +| Phase 1 | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` (no change) | +| Phase 4 | brew PostgreSQL (indri) | k8s PG (internal, after miniflux migrates to k8s) | +| Post-Phase 4 | k8s PostgreSQL | k8s PG (internal) | +| Cleanup | k8s PostgreSQL | k8s PG (internal) | + +This allows zero-downtime migration - the Tailscale service switches after apps are migrated. + +--- + ## Steps -### 1. Update Pulumi ACLs for k8s workloads +### 1. Update Pulumi ACLs for k8s workloads ✓ -Add the operator and workload tags to `pulumi/policy.hujson`. +**Status**: Complete -**Changes to tagOwners:** -```hujson -// Tailscale K8s Operator tags (Phase 1) -"tag:k8s-operator": ["autogroup:admin", "tag:blumeops"], -"tag:k8s": ["autogroup:admin", "tag:blumeops", "tag:k8s-operator"], +Added to `pulumi/policy.hujson`: +- `tag:k8s-operator` - for the operator OAuth client +- `tag:k8s` - for operator-managed resources (owned by `tag:k8s-operator`) +- Grant for `tag:k8s` → `tag:registry` access + +--- + +### 2. Create Tailscale OAuth client ✓ + +**Status**: Complete + +OAuth client stored in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `2it22lavwgbxdskoaxanej354q`) + +**Configuration used:** +- Tags: `tag:k8s-operator` +- Devices write scope tag: `tag:k8s` +- Scopes: Devices Core (R/W), Auth Keys (R/W), Services (Write) + +--- + +### 3. Deploy Tailscale Kubernetes Operator (Bootstrap) + +Deploy via `kubectl apply -k` - will be migrated to ArgoCD management in Step 5. + +**Setup manifests directory:** +```bash +mkdir -p argocd/manifests/tailscale-operator +cd argocd/manifests/tailscale-operator + +# Download static manifest from Tailscale repo +curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/manifests/operator.yaml -o operator.yaml + +# Download CRDs +curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_connectors.yaml -o crds/connectors.yaml +curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_proxyclasses.yaml -o crds/proxyclasses.yaml +# ... (other CRDs as needed) ``` -**Add grant for k8s→registry access:** -```hujson -// k8s workloads (e.g., Woodpecker CI) can push/pull from registry -{ - "src": ["tag:k8s"], - "dst": ["tag:registry"], - "ip": ["tcp:443"], -}, -``` - -**Add test case:** -```hujson -{ - "src": "tag:k8s", - "accept": ["tag:registry:443"], -}, +**Create kustomization.yaml:** +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: tailscale-system +resources: + - operator.yaml +secretGenerator: + - name: operator-oauth + namespace: tailscale-system + literals: + - client_id=PLACEHOLDER + - client_secret=PLACEHOLDER +generatorOptions: + disableNameSuffixHash: true ``` **Deploy:** ```bash -mise run tailnet-preview # Review changes -mise run tailnet-up # Apply changes -``` +# Get credentials from 1Password and create secret manually (kustomize secretGenerator is for reference) +CLIENT_ID=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-id --reveal) +CLIENT_SECRET=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-secret --reveal) ---- +kubectl create namespace tailscale-system +kubectl create secret generic operator-oauth \ + --namespace tailscale-system \ + --from-literal=client_id=$CLIENT_ID \ + --from-literal=client_secret=$CLIENT_SECRET -### 2. Create Tailscale OAuth client (MANUAL) - -Go to https://login.tailscale.com/admin/settings/oauth and create an OAuth client: - -**Configuration:** -- **Description**: `k8s-operator` -- **Tags**: `tag:k8s-operator` -- **Scopes**: - - Devices: Core (Read & Write) - - Auth Keys: Read & Write - - Services: Write - -**After creation:** -1. Copy the Client ID and Client Secret -2. Store in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`) - - Item name: `Tailscale K8s Operator OAuth` - - Fields: `client-id`, `client-secret` - ---- - -### 3. Deploy Tailscale Kubernetes Operator - -```bash -# Add helm repo -helm repo add tailscale https://pkgs.tailscale.com/helmcharts -helm repo update - -# Get credentials from 1Password -CLIENT_ID=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get "Tailscale K8s Operator OAuth" --fields client-id --reveal) -CLIENT_SECRET=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get "Tailscale K8s Operator OAuth" --fields client-secret --reveal) - -# Install operator -helm install tailscale-operator tailscale/tailscale-operator \ - --namespace tailscale-system --create-namespace \ - --set oauth.clientId=$CLIENT_ID \ - --set oauth.clientSecret=$CLIENT_SECRET +# Apply operator manifests +kubectl apply -k argocd/manifests/tailscale-operator/ ``` **Verification:** ```bash kubectl get pods -n tailscale-system -# Expected: tailscale-operator pod Running +# Expected: operator pod Running -# Check operator logs kubectl logs -n tailscale-system -l app.kubernetes.io/name=tailscale-operator ``` --- -### 4. Deploy CloudNativePG operator +### 4. Deploy ArgoCD +Deploy ArgoCD and expose via Tailscale as `argocd.tail8d86e.ts.net`. + +**Prerequisites:** +- Add `tag:argocd` to Pulumi ACLs +- Create Tailscale service `argocd` in admin console + +**Setup manifests:** ```bash -kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml +mkdir -p argocd/manifests/argocd + +# Download ArgoCD install manifest +curl -sL https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml -o argocd/manifests/argocd/install.yaml +``` + +**Create kustomization.yaml:** +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: argocd +resources: + - install.yaml + - service-tailscale.yaml # LoadBalancer for Tailscale exposure +``` + +**Create service-tailscale.yaml:** +```yaml +apiVersion: v1 +kind: Service +metadata: + name: argocd-server-tailscale + namespace: argocd + annotations: + tailscale.com/hostname: "argocd" +spec: + type: LoadBalancer + loadBalancerClass: tailscale + selector: + app.kubernetes.io/name: argocd-server + ports: + - name: https + port: 443 + targetPort: 8080 +``` + +**Deploy:** +```bash +kubectl create namespace argocd +kubectl apply -k argocd/manifests/argocd/ +``` + +**Get initial admin password:** +```bash +kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d +``` + +**Verification:** +- https://argocd.tail8d86e.ts.net loads +- Can login with admin / + +**Post-setup:** +1. Change admin password, store in 1Password +2. Configure git repo connection to `github.com/eblume/blumeops` (public, no auth needed) + - Note: Using GitHub mirror since ArgoCD can't easily reach forge without additional networking + +--- + +### 5. Migrate Tailscale Operator to ArgoCD + +Create ArgoCD Application to manage the Tailscale operator. + +**Create argocd/apps/tailscale-operator.yaml:** +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: tailscale-operator + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/tailscale-operator + destination: + server: https://kubernetes.default.svc + namespace: tailscale-system + syncPolicy: + automated: + prune: true + selfHeal: true +``` + +**Apply:** +```bash +kubectl apply -f argocd/apps/tailscale-operator.yaml +``` + +**Note on secrets:** The OAuth secret was created manually in Step 3. For GitOps, consider: +- Sealed Secrets +- External Secrets Operator +- SOPS + +For now, the secret remains manually managed outside of ArgoCD. + +--- + +### 6. Deploy CloudNativePG via ArgoCD + +**Setup manifests:** +```bash +mkdir -p argocd/manifests/cloudnative-pg + +# Download CNPG operator manifest +curl -sL https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml -o argocd/manifests/cloudnative-pg/operator.yaml +``` + +**Create kustomization.yaml:** +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - operator.yaml +``` + +**Create ArgoCD Application (argocd/apps/cloudnative-pg.yaml):** +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: cloudnative-pg + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/cloudnative-pg + destination: + server: https://kubernetes.default.svc + namespace: cnpg-system + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +``` + +**Apply:** +```bash +kubectl apply -f argocd/apps/cloudnative-pg.yaml ``` **Verification:** ```bash kubectl get pods -n cnpg-system -# Expected: cnpg-controller-manager pod Running +# Expected: cnpg-controller-manager Running ``` --- -### 5. Create PostgreSQL cluster +### 7. Create PostgreSQL Cluster via ArgoCD -Create namespace and cluster manifest: - -```bash -kubectl create namespace databases -``` +Create the database cluster. **Not exposed via Tailscale yet** - internal only until apps migrate. +**Create argocd/manifests/databases/blumeops-pg.yaml:** ```yaml -# ansible/k8s/databases/blumeops-pg.yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: @@ -146,10 +319,48 @@ spec: storageClass: standard monitoring: enablePodMonitor: true + bootstrap: + initdb: + database: miniflux + owner: miniflux ``` +**Create kustomization.yaml:** +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: databases +resources: + - blumeops-pg.yaml +``` + +**Create ArgoCD Application (argocd/apps/blumeops-pg.yaml):** +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: blumeops-pg + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/databases + destination: + server: https://kubernetes.default.svc + namespace: databases + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +``` + +**Apply:** ```bash -kubectl apply -f ansible/k8s/databases/blumeops-pg.yaml +kubectl apply -f argocd/apps/blumeops-pg.yaml ``` **Verification:** @@ -158,30 +369,92 @@ kubectl get cluster -n databases # Expected: blumeops-pg with STATUS "Cluster in healthy state" kubectl get pods -n databases -# Expected: blumeops-pg-1 pod Running +# Expected: blumeops-pg-1 Running + +# Get connection secret +kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d ``` --- -### 6. Update Alloy config +### 8. Create App-of-Apps Root Application -Add kubernetes_sd_configs for k8s metrics scraping. +Once all components are deployed, create a root application to manage all apps. -**Files to modify:** -- `ansible/roles/alloy/templates/config.alloy.j2` +**Create argocd/apps/root.yaml:** +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: root + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/eblume/blumeops.git + targetRevision: main + path: argocd/apps + destination: + server: https://kubernetes.default.svc + namespace: argocd + syncPolicy: + automated: + prune: true + selfHeal: true +``` -**Changes:** -- Add scrape config for CloudNativePG metrics -- Add scrape config for Tailscale operator metrics (if exposed) +**Apply:** +```bash +kubectl apply -f argocd/apps/root.yaml +``` + +Now ArgoCD manages itself and all other applications via the app-of-apps pattern. --- -## New Files +## New Files Summary -| File | Purpose | -|------|---------| -| `ansible/k8s/operators/` | Operator deployment notes/scripts | -| `ansible/k8s/databases/blumeops-pg.yaml` | PostgreSQL cluster manifest | +``` +argocd/ + apps/ + root.yaml # App-of-apps root + tailscale-operator.yaml # Tailscale operator app + cloudnative-pg.yaml # CNPG operator app + blumeops-pg.yaml # PostgreSQL cluster app + manifests/ + tailscale-operator/ + kustomization.yaml + operator.yaml + argocd/ + kustomization.yaml + install.yaml + service-tailscale.yaml + cloudnative-pg/ + kustomization.yaml + operator.yaml + databases/ + kustomization.yaml + blumeops-pg.yaml +``` + +--- + +## Pulumi ACL Updates Required + +Add to `pulumi/policy.hujson`: +```hujson +"tag:argocd": ["autogroup:admin", "tag:blumeops"], +``` + +Add to Erich's test accept list: +```hujson +"accept": [..., "tag:argocd:443"], +``` + +Add to Allison's deny list: +```hujson +"deny": [..., "tag:argocd:443"], +``` --- @@ -191,16 +464,18 @@ Add kubernetes_sd_configs for k8s metrics scraping. # 1. Tailscale operator running kubectl get pods -n tailscale-system -# 2. CloudNativePG operator running +# 2. ArgoCD accessible +curl -k https://argocd.tail8d86e.ts.net/healthz + +# 3. CloudNativePG operator running kubectl get pods -n cnpg-system -# 3. PostgreSQL cluster healthy +# 4. PostgreSQL cluster healthy kubectl get cluster -n databases -kubectl get pods -n databases -# 4. Test database connection (from indri) -kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d -# Use the URI to connect via psql +# 5. All ArgoCD apps synced +kubectl get applications -n argocd +# All should show STATUS: Synced, HEALTH: Healthy ``` --- @@ -208,15 +483,19 @@ kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base ## Rollback ```bash -# Remove PostgreSQL cluster -kubectl delete cluster -n databases blumeops-pg +# Remove ArgoCD apps (will cascade delete managed resources) +kubectl delete application -n argocd root +kubectl delete application -n argocd blumeops-pg +kubectl delete application -n argocd cloudnative-pg +kubectl delete application -n argocd tailscale-operator + +# Remove ArgoCD +kubectl delete -k argocd/manifests/argocd/ +kubectl delete namespace argocd + +# Remove namespaces kubectl delete namespace databases - -# Remove CloudNativePG operator -kubectl delete -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml - -# Remove Tailscale operator -helm uninstall tailscale-operator -n tailscale-system +kubectl delete namespace cnpg-system kubectl delete namespace tailscale-system # Revert ACL changes