Commit graph

24 commits

Author SHA1 Message Date
6a436d141a Update CI/CD plan: mark Phase 1 complete, add runner observability
All checks were successful
Test CI / test (push) Successful in 0s
- Mark Phase 1 (Enable Actions) as completed with date
- Check off all verification items in P1
- Add Step 6 to Phase 4 for runner logging and metrics
- Update overview table with status column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 17:10:14 -08:00
7893c41020 Enable Forgejo Actions (Phase 1) (#48)
All checks were successful
Test CI / test (push) Successful in 0s
## Summary
- Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password
- Enable Forgejo Actions in config (`[actions] ENABLED = true`)
- Add `repo.actions` to DEFAULT_REPO_UNITS
- Clean up unused MySQL database fields (we use SQLite)

## Phase 1 Progress
This PR covers the first part of Phase 1 (ci-cd-bootstrap plan):
- [x] Refactor app.ini to ansible template
- [x] Store secrets in 1Password
- [x] Enable Actions in config
- [ ] Deploy config changes (pending review)
- [ ] Create runner registration token
- [ ] Deploy runner to k8s
- [ ] Test with simple workflow

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags forgejo` to deploy
- [ ] Verify Forgejo restarts correctly
- [ ] Verify Actions tab appears in repo settings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48
2026-01-23 17:00:12 -08:00
016f1043c8 Retire k8s-migration plan and create ci-cd-bootstrap plan 2026-01-23 14:13:01 -08:00
6a140107c6 P7 forgejo plan updated 2026-01-21 20:04:18 -08:00
4dd74dfff8 complete P6 2026-01-21 19:16:04 -08:00
89eff26301 complete P5.1 2026-01-21 16:32:01 -08:00
21848a7919 P5.1: Migrate minikube from podman to QEMU2 driver (#38)
## Summary
- Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support
- Update ansible minikube role with qemu installation and containerd runtime
- Remove podman role dependency from indri.yml
- Add synology user creation steps and post-migration zot reconfiguration notes

## Why
Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support.

## Deployment and Testing
- [ ] Create k8s-storage user on Synology DSM
- [ ] Store credentials in 1Password (synology-k8s-storage)
- [ ] Export current k8s state
- [ ] Stop and delete podman-based minikube cluster
- [ ] Run ansible to create QEMU2 cluster
- [ ] Test NFS volume mount with test pod
- [ ] Redeploy ArgoCD and all apps
- [ ] Verify all services healthy
- [ ] Reconfigure zot registry mirrors for containerd (post-migration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38
2026-01-21 16:03:37 -08:00
7b60cca31e Document P6 blocker and add P5.1 QEMU2 migration plan (#37)
## Summary
- Document P6 (Kiwix/Transmission) blocker: podman driver cannot mount external volumes
- Add P5.1 plan to migrate minikube from podman to QEMU2 driver
- Update overview with corrected phase statuses and driver information

## Background

P6 implementation (`feature/p6-kiwix-transmission`) was completed but blocked because **all volume mount approaches failed** with the podman driver:

| Approach | Result |
|----------|--------|
| NFS volume | Failed - CAP_SYS_ADMIN required |
| SMB CSI driver | Failed - EPERM in rootless container |
| `minikube mount` (9p) | Failed - permission denied |
| hostPath | Failed - path doesn't exist in container |

Root cause: Podman driver runs minikube in a rootless container lacking kernel capabilities for filesystem mounts.

## What's Next

1. Merge this documentation PR
2. Execute P5.1 (QEMU2 migration) in a fresh session
3. Retry P6 with the QEMU2 driver

## Deployment and Testing
- [x] No deployment needed - documentation only
- [x] ArgoCD apps reset to main
- [x] Cluster healthy (except kiwix/transmission intentionally offline)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/37
2026-01-20 20:49:48 -08:00
b97d461a5a P6: Kiwix and Transmission migration planning (#35)
## Summary
- Detailed planning document for Phase 6 of k8s migration
- Transmission as standalone general-purpose torrent service with web UI at torrent.tail8d86e.ts.net
- NFS storage on sifaka (/volume1/torrents) shared between both services
- Declarative ZIM torrent list in kiwix's ConfigMap, synced to transmission via sidecar
- ZIM watcher CronJob for automatic kiwix restart when new archives complete
- Supports both GitOps (declarative) and interactive (web UI) torrent management

## Architecture Highlights
- **torrent namespace**: Standalone transmission with Tailscale ingress
- **kiwix namespace**: kiwix-serve with torrent-sync sidecar
- **Shared NFS PV**: Single PV referenced by PVCs in both namespaces
- **No backup needed**: Sifaka is RAID 5/6 and already the backup target

## Deployment and Testing
- [ ] Review plan document
- [ ] Verify NFS export on sifaka is feasible
- [ ] Approve architecture decisions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/35
2026-01-20 18:42:11 -08:00
f98103a58d P5 done 2026-01-20 15:04:46 -08:00
0439fbb704 P5: Migrate devpi to Kubernetes (#34)
## Summary
- Migrate devpi PyPI caching proxy from indri LaunchAgent to Kubernetes
- Custom container image with devpi-server + devpi-web + auto-init
- StatefulSet with 50Gi PVC, Tailscale Ingress at pypi.tail8d86e.ts.net
- Remove devpi from ansible playbooks and update CLAUDE.md with k8s workflow

## Key Changes
- Add CRI-O registry mirror config for registry.tail8d86e.ts.net
- Change ArgoCD apps to manual sync (was auto-sync causing issues)
- 2Gi memory limit for Whoosh indexer (reclaimed after startup)

## Deployment and Testing
- [x] devpi pod healthy in k8s
- [x] pip install through proxy works
- [x] mcquack 1.0.0 uploaded and installable
- [x] Old devpi stopped on indri

## Post-Merge
Reset ArgoCD to main:
```
argocd app set apps --revision main && argocd app sync apps
argocd app set devpi --revision main && argocd app sync devpi
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/34
2026-01-20 14:55:37 -08:00
b2307412fc Add P4 implementation notes and mark complete 2026-01-20 09:10:23 -08:00
735b643429 P4: Miniflux migration + PostgreSQL consolidation (#33)
## Summary
- Deploy miniflux in k8s via ArgoCD
- Expose via Tailscale Ingress at feed.tail8d86e.ts.net
- Retire brew PostgreSQL (no longer needed)
- Rename k8s-pg to pg (canonical hostname)
- Remove ansible miniflux and postgresql roles
- Update borgmatic to backup pg.tail8d86e.ts.net
- Update all zk documentation

## Deployment and Testing
- [x] Miniflux pod running in k8s
- [x] User login works at https://feed.tail8d86e.ts.net
- [x] Feeds and entries visible
- [x] brew miniflux and postgresql stopped
- [x] Tailscale services migrated (feed, pg)
- [x] zk documentation updated
- [x] Run ansible to apply role removals
- [ ] Verify borgmatic backup with new pg hostname

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33
2026-01-20 09:04:47 -08:00
463f476374 P3 done
Updated P3_postgresql.complete.md with full implementation notes including:
- borgmatic borg path fix
- Disaster recovery testing
- CloudNativePG managed roles for borgmatic user
- Dual database backup configuration
- ACL grant for homelab → k8s
- ArgoCD selfHeal disabled for feature branch workflow
- CNPG default values to prevent drift

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 18:19:33 -08:00
e69a3df2d4 P3 done 2026-01-19 18:03:48 -08:00
f0c28a3cdd Rename P2 plan to .complete.md 2026-01-19 15:06:27 -08:00
45dfefa8df Mark P2 complete with implementation notes
Documents lessons learned:
- SSH credential template for all forge repos
- Kustomize patches must omit namespace for matching
- Tailscale hostname cutover requires manual admin console deletion
- ArgoCD workflow: all apps target main, manual sync for control

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:06:14 -08:00
7e6742ad24 K8s Migration Phase 2: Grafana to Kubernetes (#30)
## Summary
- Migrate Grafana from Homebrew/Ansible to Kubernetes deployment
- Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed)
- Add Grafana Helm chart deployment via ArgoCD with multi-source pattern
- Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps)
- Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal`

## Key Changes
- `argocd/apps/grafana.yaml` - Grafana Helm chart Application
- `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps
- `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo
- `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0

## Deployment and Testing
- [x] Deploy Loki config change: `mise run provision-indri -- --tags loki`
- [x] Create namespace: `ki create namespace monitoring`
- [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -`
- [x] Sync ArgoCD apps (grafana, grafana-config)
- [x] Verify Grafana works at https://grafana.tail8d86e.ts.net
- [x] Remove svc:grafana from ansible tailscale_serve
- [x] Stop brew grafana: `ssh indri 'brew services stop grafana'`
- [x] Delete ansible grafana role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30
2026-01-19 14:40:25 -08:00
680ad1095b Rename P1 to complete 2026-01-19 10:03:52 -08:00
a8f4d00294 K8s Migration Phase 1: Infrastructure Setup (#29)
## Summary
- Split k8s migration plan into phases folder for easier navigation
- Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads
- Phase 1 work in progress

## Phase 1 Goals
- Tailscale Kubernetes Operator
- CloudNativePG Operator
- PostgreSQL cluster for future app migrations

## Deployment and Testing
- [ ] Review Phase 1 plan
- [ ] `mise run tailnet-preview` to verify ACL changes
- [ ] `mise run tailnet-up` to apply ACL changes
- [ ] Create Tailscale OAuth client (manual)
- [ ] Deploy operators and PostgreSQL cluster

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29
2026-01-19 09:49:52 -08:00
3679124ebd Expose Kubernetes API as Tailscale service (Step 0.14) (#27)
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)

The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.

## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
2026-01-18 12:49:20 -08:00
19a82373d5 K8s Migration Phase 0: Foundation Infrastructure (#26)
## Summary
- Step 0.1: Update Pulumi ACLs with tag:registry
- Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent
- Step 0.4: Add Zot to Tailscale Serve configuration
- Step 0.5: Create Zot metrics role for Prometheus scraping
- Step 0.6: Add Zot log collection to Alloy
- Step 0.7: Update indri-services-check with zot checks
- Step 0.8: Add podman role for container runtime
- Step 0.9: Add minikube role for Kubernetes cluster
- Step 0.10: Configure remote kubectl access with 1Password credentials

## Remaining Steps
- [ ] Step 0.11: Add minikube to indri-services-check
- [ ] Step 0.12: Create zettelkasten documentation
- [ ] Step 0.13: Verify main playbook (already done - roles added)

## Deployment and Testing
- [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net
- [x] Podman machine running on indri
- [x] Minikube cluster running on indri
- [x] kubectl access from gilbert working with 1Password credentials
- [ ] indri-services-check passes all checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/26
2026-01-18 12:06:28 -08:00
ee196b0c10 Fix Phase 0 plan based on review feedback (#25)
## Summary
- Step 0.3: Use launchctl unload/load pattern for handlers (consistent with existing handlers)
- Step 0.6: Correct file path - add zot logs to alloy defaults/main.yml
- Step 0.9: Use cri-o runtime instead of containerd
- Step 0.10: Simplify kubeconfig instructions - focus on goal not implementation

## Deployment and Testing
- [x] Documentation-only change, no deployment needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/25
2026-01-17 20:07:10 -08:00
c8433467c1 Add Kubernetes migration plan documentation (#24)
## Summary
- Comprehensive phased plan for migrating blumeops services to minikube
- Technical decisions documented: Zot registry, Podman driver, CloudNativePG, Tailscale Operator
- 9 migration phases with verification and rollback procedures
- LaunchAgent absolute path requirements documented
- Observability requirements (zk docs, logging, metrics, dashboards) for new services

## Deployment and Testing
- [x] Plan document created at `docs/k8s-migration.md`
- [ ] Review plan phases for completeness
- [ ] Validate technical decisions align with requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/24
2026-01-17 17:34:53 -08:00