blumeops

Author	SHA1	Message	Date
Erich Blume	46081f5f10	Update rule 10: also require permission to push to main Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:45:44 -08:00
Erich Blume	55522579dc	Add rule: never merge PRs without explicit user request Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:43:51 -08:00
Erich Blume	a93f2a77e1	Merge pull request 'Migrate remaining secrets to ExternalSecrets' (#67 ) from feature/migrate-remaining-secrets into main	2026-01-28 20:41:45 -08:00
Erich Blume	8f4660915d	Fix argocd SSH key format for 1Password Connect 1Password Connect doesn't support ?ssh-format=openssh, so we need a separate Secure Note item with the OpenSSH-formatted key. Created new 1Password item: argocd-forge-ssh-key Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:31:16 -08:00
Erich Blume	9114aac8f6	Switch all ExternalSecrets to creationPolicy: Owner ESO now has full ownership of these secrets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:27:16 -08:00
Erich Blume	dd6cf20d51	Remove obsolete secret templates - Delete 13 .yaml.tpl files replaced by ExternalSecrets - Update immich/README.md with direct CNPG secret copy instructions - Update miniflux/README.md with context flag and ESO note Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:26:37 -08:00
Erich Blume	351528474c	Add ExternalSecrets for remaining k8s secrets Migrate 10 secret templates to ESO ExternalSecrets with 1Password Connect: - databases: eblume, borgmatic, teslamate passwords - tailscale-operator: OAuth client credentials - grafana-config: admin password, teslamate datasource - teslamate: db password, encryption key - forgejo-runner: runner registration token - argocd: forge SSH credentials All use creationPolicy: Merge for safe migration from existing secrets. Skipped: - miniflux/secret-db: Uses CNPG secret, not 1Password directly - immich/secret-db: Requires 1Password item creation first - 1password-connect: Bootstrap secret, must stay as template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 19:50:38 -08:00
Erich Blume	482414346e	Add External Secrets Operator with 1Password Connect (#66 ) (#66 ) ## Summary - Add 1Password Connect server for secrets automation API - Add External Secrets Operator (ESO) to sync secrets from 1Password to K8s - Add ClusterSecretStore connecting ESO to 1Password Connect - Convert devpi secret to ExternalSecret as proof of concept ## Architecture ``` 1Password Cloud → 1Password Connect (k8s) → ESO → Native K8s Secrets ``` ## Deployment and Testing - [ ] Mirror Helm charts to forge (connect-helm-charts, external-secrets) - DONE - [ ] Create 1Password Connect credentials (`op connect server create`) - [ ] Store credentials in 1Password item "1Password Connect" - [ ] Bootstrap secret: `op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl \| kubectl apply -f -` - [ ] Deploy 1password-connect: `argocd app sync 1password-connect` - [ ] Deploy external-secrets: `argocd app sync external-secrets` - [ ] Deploy external-secrets-config: `argocd app sync external-secrets-config` - [ ] Test devpi ExternalSecret: `argocd app sync devpi` - [ ] Verify secret synced: `kubectl get externalsecret -n devpi` ## Future Work After PoC validated, migrate remaining 12 secret templates to ExternalSecrets: - databases (3), tailscale-operator (1), grafana-config (2), teslamate (2) - forgejo-runner (1), argocd (1), immich (1), 1password-connect (1 - self-bootstrap) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/66	2026-01-28 19:30:10 -08:00
Erich Blume	3971670832	Remove immich-sync ansible role (#65 ) ## Summary - Remove immich_sync ansible role (server-side photo sync via osxphotos) - The Immich iOS app has built-in automatic backup that replaces this functionality - iOS app supports foreground/background backup and can sync iCloud photos directly ## Deployment and Testing - [ ] Clean up files on indri (see manual cleanup commands below) - [ ] Configure Immich iOS app for automatic backup ### Manual cleanup on indri: ```bash # Unload and remove LaunchAgent launchctl unload ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist rm ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist # Remove script and credentials rm ~/bin/immich-sync.sh rm ~/.immich-api-key # Remove logs rm ~/Library/Logs/mcquack.immich-sync.*.log # Optionally remove export directory (check if empty first) ls ~/Pictures/immich-export # rm -r ~/Pictures/immich-export ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/65	2026-01-28 08:49:22 -08:00
Erich Blume	aa4464f84e	Upgrade Immich from v2.4.1 to v2.5.0 (#64 ) ## Summary - Upgrades Immich image tag from v2.4.1 to v2.5.0 ## Deployment and Testing - [ ] Point immich ArgoCD app at feature branch and sync - [ ] Verify pods come up healthy - [ ] Verify Immich web UI accessible - [ ] Reset to main and sync after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/64	2026-01-27 20:51:09 -08:00
Erich Blume	54945be0e3	Add immich-sync ansible role for photo library sync (#63 ) ## Summary - Add `immich_sync` role that syncs macOS Photos Library to Immich - Uses osxphotos to export photos with metadata to staging directory - Uses immich-cli (via Docker) to upload to Immich server - LaunchAgent schedules hourly syncs following mcquack pattern - API key fetched from 1Password in playbook pre_tasks ## Architecture ``` Photos Library → osxphotos export → ~/Pictures/immich-export/ → immich-cli upload → Immich ``` ## Prerequisites (manual) - Install osxphotos on indri: Add `"pipx:osxphotos" = "latest"` to `~/.config/mise/config.toml`, run `mise install` - Docker is already installed on indri ## Deployment and Testing - [ ] Dry run: `mise run provision-indri -- --tags immich_sync --check --diff` - [ ] Deploy: `mise run provision-indri -- --tags immich_sync` - [ ] Verify LaunchAgent: `ssh indri 'launchctl list \| grep immich'` - [ ] Test manual sync: `ssh indri '~/bin/immich-sync.sh'` - [ ] Check logs: `ssh indri 'tail -50 ~/Library/Logs/mcquack.immich-sync.out.log'` - [ ] Verify photos in Immich at https://photos.ops.eblu.me 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/63	2026-01-26 12:38:38 -08:00
Erich Blume	8621996343	Add Immich photo management + migrate forge URLs (#62 ) ## Summary - Migrate all ArgoCD app repo URLs from `indri.tail8d86e.ts.net:2200` to `forge.ops.eblu.me:2222` - Add Immich self-hosted photo management service with: - Helm chart deployment via ArgoCD - PostgreSQL cluster with pgvecto.rs for AI vector search (immich-pg) - NFS storage on sifaka for photo library (2Ti) - Tailscale Ingress + Caddy proxy for `photos.ops.eblu.me` - Machine learning service for face/object recognition ## Deployment and Testing - [x] Update ArgoCD repo-creds-forge secret with new URL (one-time manual step) - [ ] Sync `apps` to pick up new applications - [ ] Sync all existing apps to verify new forge URL works - [ ] Sync `blumeops-pg` to deploy immich-pg cluster - [ ] Wait for immich-pg to be healthy - [ ] Create immich-db secret from auto-generated password - [ ] Sync `immich-storage` (PV, PVC, Ingress) - [ ] Sync `immich` (Helm chart) - [ ] Run `mise run provision-indri -- --tags caddy` to add photos.ops.eblu.me - [ ] Verify Immich UI is accessible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/62	2026-01-26 11:20:11 -08:00
Erich Blume	c8b655f177	Build local containers for k8s services (#61 ) ## Summary - Move devpi Dockerfile from argocd/manifests to containers/devpi/ - Add containers for: transmission, teslamate, miniflux, kiwix-serve, kubectl - Update all k8s deployments to use local images (registry.ops.eblu.me/blumeops/*) - All containers use v1.0.0 tag for initial release ## Containers Added \| Container \| Source \| Notes \| \|-----------\|--------\|-------\| \| devpi \| python:3.12-slim \| Existing, moved to containers/ \| \| kubectl \| alpine + download \| For zim-watcher CronJob \| \| miniflux \| Go build from source \| v2.2.16 \| \| kiwix-serve \| Download pre-built binary \| v3.8.1 \| \| transmission \| alpine + apk install \| Simpler than linuxserver image \| \| teslamate \| Elixir build from source \| v2.2.0 \| ## Deployment and Testing - [ ] Build and tag devpi-v1.0.0 - [ ] Build and tag kubectl-v1.0.0 - [ ] Build and tag miniflux-v1.0.0 - [ ] Build and tag kiwix-serve-v1.0.0 - [ ] Build and tag transmission-v1.0.0 - [ ] Build and tag teslamate-v1.0.0 - [ ] Sync ArgoCD apps and verify services 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/61	2026-01-25 21:35:57 -08:00
Erich Blume	ea42362b6f	Migrate Forgejo runner to Kubernetes with DinD (#60 ) ## Summary - Deploy Forgejo runner to k8s with Docker-in-Docker sidecar - Add job execution image with Node.js and Docker CLI - Retire host-mode runner on indri - All CI jobs now run containerized in k8s ## Components Added - `containers/forgejo-runner/Dockerfile` - Job execution image - `argocd/apps/forgejo-runner.yaml` - ArgoCD Application - `argocd/manifests/forgejo-runner/` - Kubernetes manifests ## Components Removed - `ansible/roles/forgejo_runner/` - No longer needed ## Changes to Existing Files - `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env - `.github/actionlint.yaml` - Only `k8s` label now valid ## Deployment 1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl \| kubectl --context=minikube-indri apply -f -` 2. Sync ArgoCD: `argocd app sync forgejo-runner` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60	2026-01-25 19:56:17 -08:00
Erich Blume	66badfafd1	Migrate k8s services to Caddy (.ops.eblu.me) (#59 ) All checks were successful Build Container / build (push) Successful in 13s Details nettest-v0.8.0 ## Summary - Add Caddy reverse proxy routes for all k8s services (grafana, argocd, prometheus, loki, miniflux, devpi, kiwix, torrent, teslamate) - Add PostgreSQL via Caddy L4 TCP proxy on port 5432 - Caddy proxies to existing Tailscale endpoints - traffic stays local on indri - Both `.ops.eblu.me` and `.tail8d86e.ts.net` URLs continue to work ## Updated References - Alloy: prometheus/loki push endpoints → `.ops.eblu.me` - Borgmatic: PostgreSQL backup host → `pg.ops.eblu.me` - Devpi: DEVPI_OUTSIDE_URL → `pypi.ops.eblu.me` - indri-services-check: health check URLs - CLAUDE.md: argocd login command ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags caddy` to deploy new Caddy config - [ ] Test HTTP services: `curl https://grafana.ops.eblu.me/api/health` - [ ] Test PostgreSQL: `pg_isready -h pg.ops.eblu.me -p 5432` - [ ] Run `mise run provision-indri -- --tags alloy` to update Alloy endpoints - [ ] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic - [ ] Sync devpi in ArgoCD: `argocd app sync devpi` - [ ] Re-login to ArgoCD: `argocd login argocd.ops.eblu.me ...` - [ ] Run `mise run indri-services-check` to verify all services 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/59	2026-01-25 12:56:31 -08:00
Erich Blume	d6e6b48f6a	Migrate registry to Caddy (registry.ops.eblu.me) (#58 ) ## Summary - Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me` - Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy) - Update minikube containerd config for new registry URL - Update devpi manifest, CI actions, and mise tasks ## Deployment and Testing - [ ] Run `mise run provision-indri -- --check --diff` (dry run) - [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config - [ ] Sync devpi ArgoCD app: `argocd app sync devpi` - [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'` - [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog` - [ ] Run `mise run indri-services-check` to verify all services healthy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58	2026-01-25 12:06:15 -08:00
Erich Blume	9c1b7c7ca1	Update docs for Caddy migration (#57 ) ## Summary - Update CLAUDE.md with new service routing documentation - Document the two DNS domains: `.ops.eblu.me` (Caddy) vs `.tail8d86e.ts.net` (Tailscale) - Fix incorrect service listings (Prometheus/Loki are in k8s, not indri) ## ZK Updates (not in this PR) Also updated the blumeops zk card with: - Source code URL (forge is primary, GitHub is mirror) - Services split into Caddy vs Tailscale sections - Updated port map for Caddy - Updated "Adding a New Service" instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/57	2026-01-25 11:52:35 -08:00
Erich Blume	1184b4de1d	Add Caddy layer4 for Forgejo SSH (#56 ) ## Summary - Add layer4 TCP proxy configuration to Caddyfile template for SSH services - Configure Forgejo SSH on port 2222 → localhost:2200 - Switch HTTPS from port 8443 (testing) to 443 (production) - Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin ## What This Enables Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from: - Tailnet clients (gilbert) - Docker containers on indri - Kubernetes pods in minikube This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names. ## Testing Done - [x] Caddy rebuilt with layer4 plugin - [x] Validated Caddyfile syntax - [x] Cleared `svc:forge` from tailscale serve - [x] Verified HTTPS works: `curl https://forge.ops.eblu.me` - [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me` - [x] Verified git clone works via new endpoint - [x] Verified minikube pods can reach both HTTPS and SSH endpoints ## Deployment Caddy is already running with the new config on indri. This PR captures the ansible changes. ## Next Steps - Update zk docs with new git remote format - Migrate registry and other services to Caddy - Retire tailscale_services ansible role 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56	2026-01-25 11:37:23 -08:00
Erich Blume	682a68dc9c	Add Caddy reverse proxy for blumeops services (#55 ) ## Summary - Add Caddy ansible role following zot pattern (manual build, ansible deploy) - Caddy built with Gandi DNS plugin for ACME DNS-01 challenges - Gandi PAT fetched from 1Password and written to secured file on indri - Configure wildcard TLS for `*.ops.eblu.me` - Initial services: forge, registry (indri-local) - Uses port 8443 during testing to avoid Tailscale serve conflicts ## Build Instructions (already done) On indri: ```bash cd ~/code/3rd/caddy && mise run build ``` ## Deployment and Testing - [ ] Review Caddyfile configuration - [ ] Run `mise run provision-indri -- --tags caddy` to deploy - [ ] Test: `curl -v https://forge.ops.eblu.me:8443` (should get TLS cert) - [ ] Test: `curl -v https://registry.ops.eblu.me:8443/v2/` (should return `{}`) - [ ] Once verified, switch to port 443 and migrate services from Tailscale serve ## Files Changed - `ansible/playbooks/indri.yml` - Add pre_task for Gandi PAT, add caddy role - `ansible/roles/caddy/` - New role with Caddyfile and LaunchAgent templates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/55	2026-01-25 09:35:06 -08:00
Erich Blume	b08faa50cc	Add Gandi DNS management via Pulumi (#54 ) ## Summary - Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/` - Add Gandi LiveDNS management for `eblu.me` domain - Create wildcard DNS record `.ops.eblu.me` → indri's Tailscale IP (100.98.163.89) - Add mise tasks: `dns-up`, `dns-preview` - Update `tailnet-up` to pass `--yes` by default - Document PAT cycling process (expires every 30 days) ## Background This enables using real DNS names (`.ops.eblu.me`) that resolve to Tailscale IPs, which allows containers and other systems to resolve services without depending on MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain tailnet-only while using standard DNS. ## Deployment and Testing - [ ] Run `cd pulumi/gandi && uv sync` to install dependencies - [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack - [ ] Run `mise run dns-preview` to verify configuration - [ ] Run `mise run dns-up` to apply DNS records - [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54	2026-01-25 08:15:46 -08:00
Erich Blume	af3536bc17	Simplify indri IP extraction from tailscale status Some checks failed Build Container / build (push) Failing after 13s Details nettest-v0.7.0 Use simple grep and awk to parse plain text tailscale status output instead of trying to parse JSON. Also show the status output for debugging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 20:04:28 -08:00
Erich Blume	04fdd5906d	Get indri's IP from tailscale status for registry access Some checks failed Build Container / build (push) Failing after 9s Details nettest-v0.6.0 Use 'tailscale status' to get indri's Tailscale IP and add it to /etc/hosts for registry hostname resolution. The registry service runs on indri, so we need indri's IP specifically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 20:00:57 -08:00
Erich Blume	16ccabbc34	Resolve registry IP via Tailscale and add to /etc/hosts Some checks failed Build Container / build (push) Failing after 8s Details nettest-v0.5.0 The Tailscale container's DNS doesn't work because it runs in userspace mode. Instead, resolve the registry IP using 'tailscale ip' and add it to /etc/hosts inside the container before running skopeo. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 19:44:39 -08:00
Erich Blume	b4b8abb2d9	Add tag:ci-gateway to Tailscale ACL for CI builds Some checks failed Build Container / build (push) Failing after 13s Details nettest-v0.4.0 - Add tag:ci-gateway to tagOwners - Grant ci-gateway access to registry on port 443 - Add test for ci-gateway -> registry access Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 19:38:02 -08:00
Erich Blume	9f98b3007e	Add debugging for Tailscale container failures Some checks failed Build Container / build (push) Failing after 7s Details nettest-v0.3.0 Capture container logs when the Tailscale sidecar exits unexpectedly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 19:32:40 -08:00
Erich Blume	424647cd93	Use Tailscale sidecar for container registry push Some checks failed Build Container / build (push) Failing after 1m9s Details nettest-v0.2.0 Docker Desktop's VM can't resolve tailnet hostnames. Work around this by: 1. Starting a Tailscale container that joins the tailnet 2. Building the image with docker build 3. Saving to tarball with docker save 4. Pushing via skopeo inside the Tailscale container Uses TS_CI_GATEWAY_AUTHKEY repository secret for authentication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 19:29:01 -08:00
Erich Blume	b598ea8d5a	Add indri-runner-logs script to fetch workflow logs Fetches logs for Forgejo Actions runs from indri's local storage. Logs are stored as zstd-compressed files in the forgejo data directory. Usage: mise run indri-runner-logs <run_id> Only works for runs executed by the indri-host-runner. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:25:32 -08:00
Erich Blume	31697b4d63	Add nettest container for CI/CD network debugging (#52 ) Some checks failed Build Container / build (push) Failing after 18s Details nettest-v0.1.0 ## Summary - Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script - Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags - Test script checks DNS resolution and HTTPS connectivity to forge and registry ## Deployment and Testing - [ ] Merge PR to main - [ ] Run `mise run container-release nettest v0.1.0` to trigger first build - [ ] Verify workflow runs successfully and container can reach tailnet services - [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52	2026-01-24 16:54:35 -08:00
Erich Blume	13b8d16eb2	Remove mention of plans dir All checks were successful Test CI / test (push) Successful in 3s Details	2026-01-24 16:22:39 -08:00
Erich Blume	ceba6b3c2c	Remove plans, they dont seem to work All checks were successful Test CI / test (push) Successful in 3s Details	2026-01-24 16:21:49 -08:00
Erich Blume	8ca8798121	Switch to Buildah for container builds (#51 ) All checks were successful Test CI / test (push) Successful in 4s Details ## Summary - Replace Docker with Buildah for container image builds - No Docker socket required - buildah is daemonless - Cleaner security model (no privileged containers or socket mounting) - Remove Docker-related security context from deployment ## Changes - Update Dockerfile to install buildah/podman instead of docker-cli - Configure buildah storage with overlay driver and fuse-overlayfs - Update composite action to use `buildah bud` and `buildah push` - Add `imagePullPolicy: Always` to ensure fresh image pulls - Update test workflow to verify buildah/podman ## Testing - [ ] Runner pod starts successfully - [ ] Buildah is available in runner - [ ] Test workflow verifies buildah/podman versions - [ ] Container build workflow builds and pushes to zot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51	2026-01-24 13:30:26 -08:00
Erich Blume	5fcd122494	Reorganize CI/CD bootstrap phases and add custom runner Dockerfile (#50 ) All checks were successful Test CI / test (push) Successful in 2s Details ## Summary - Reorder CI/CD bootstrap phases to address chicken-and-egg problem - P2 is now "Custom Runner Image" (stock runner lacks Node.js) - Add P3 for "Mirror Forgejo & Build from Source" - Rename P3 -> P4 (Self-Deploy), P4 -> P5 (Container Builds) - Add Dockerfile for custom runner with Node.js, npm, docker, build tools - Update overview with new phase structure, host mode notes, and cross-compilation challenge ## Key Changes ### Phase Reordering \| Old \| New \| Name \| \|-----\|-----\|------\| \| P1 \| P1 \| Enable Actions (complete) \| \| P2 \| P2 \| Custom Runner Image (new focus) \| \| - \| P3 \| Mirror Forgejo & Build (new) \| \| P3 \| P4 \| Self-Deploy \| \| P4 \| P5 \| Container Builds \| ### Custom Runner Dockerfile The stock `forgejo/runner:3.5.1` image lacks Node.js, so `actions/checkout@v4` doesn't work. The new Dockerfile adds: - Node.js + npm (for GitHub Actions) - Docker CLI (for container builds) - Build tools (gcc, make, curl, jq) ### Bootstrap Strategy 1. Build custom runner image manually on gilbert (podman build) 2. Push to zot registry 3. Update deployment to use custom image 4. Then enable auto-build workflow for runner ## Deployment and Testing - [x] Review plan changes - [x] Build custom runner image manually and verify - [x] Update runner deployment - [x] Test `actions/checkout@v4` works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/50	2026-01-23 18:50:27 -08:00
Erich Blume	3bcad4189f	Add actionlint pre-commit hook for workflow validation (#49 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Fix workflow to use `github.` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.` aliases) - Pass untrusted inputs through environment variables (security best practice per actionlint) - Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally ## Deployment and Testing - [x] Pre-commit hooks all pass - [x] actionlint validates `.forgejo/workflows/test.yaml` successfully - [ ] Verify workflow runs without errors on Forge after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49	2026-01-23 17:56:24 -08:00
Erich Blume	6a436d141a	Update CI/CD plan: mark Phase 1 complete, add runner observability All checks were successful Test CI / test (push) Successful in 0s Details - Mark Phase 1 (Enable Actions) as completed with date - Check off all verification items in P1 - Add Step 6 to Phase 4 for runner logging and metrics - Update overview table with status column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 17:10:14 -08:00
Erich Blume	7893c41020	Enable Forgejo Actions (Phase 1) (#48 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password - Enable Forgejo Actions in config (`[actions] ENABLED = true`) - Add `repo.actions` to DEFAULT_REPO_UNITS - Clean up unused MySQL database fields (we use SQLite) ## Phase 1 Progress This PR covers the first part of Phase 1 (ci-cd-bootstrap plan): - [x] Refactor app.ini to ansible template - [x] Store secrets in 1Password - [x] Enable Actions in config - [ ] Deploy config changes (pending review) - [ ] Create runner registration token - [ ] Deploy runner to k8s - [ ] Test with simple workflow ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags forgejo` to deploy - [ ] Verify Forgejo restarts correctly - [ ] Verify Actions tab appears in repo settings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48	2026-01-23 17:00:12 -08:00
Erich Blume	016f1043c8	Retire k8s-migration plan and create ci-cd-bootstrap plan	2026-01-23 14:13:01 -08:00
Erich Blume	25fa2ea665	Update indri-services-check	2026-01-22 21:31:11 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	11075d4517	Remove logfmt parsing stage from Alloy k8s config The stage.match selector wasn't preventing Alloy from logging decode errors internally. Removing logfmt parsing entirely - JSON parsing handles most structured logs, and plain text logs still get collected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 18:06:34 -08:00
Erich Blume	e6de7ba391	Fix Alloy logfmt decode errors for JSON logs (#46 ) ## Summary - Use `stage.match` to conditionally apply logfmt parsing only to lines that don't start with `{` - This prevents error spam like `"failed to decode logfmt" component_path=/ component_id=loki.process.pods component=stage type=logfmt err="logfmt syntax error at pos 2 on line 1: unexpected '\"'"` when JSON-formatted logs hit the logfmt parser ## Deployment and Testing - [ ] Sync alloy-k8s app to feature branch and verify errors stop appearing - [ ] Verify JSON logs are still parsed correctly - [ ] Verify logfmt logs (from Loki, Prometheus etc.) are still parsed correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/46	2026-01-22 18:00:34 -08:00
Erich Blume	16bfe06b7b	Fix LaunchDaemon check to use become: true LaunchDaemons run in the system domain and require sudo to query. Without become: true, the check always fails and tries to reload. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 17:34:23 -08:00
Erich Blume	57bf8512dc	Log filtering cleanup and observability improvements (#45 ) ## Summary - Suppress noisy storage-provisioner Endpoints deprecation warning (upstream minikube issue) - Disable thermal collector on indri Alloy (not supported on macOS M1) - Add macOS power/thermal metrics collection via powermetrics LaunchDaemon - Add Power & Thermal section to macOS Grafana dashboard - Add logfmt parser for k8s log level extraction (Loki, Prometheus, etc.) - Extract more fields from JSON logs (zot compatibility - uses "message" not "msg") - Silence logfmt parse errors for non-logfmt logs - Fix JSON escaping in devpi dashboard ## Deployment and Testing - [x] Deployed Alloy config changes to indri via ansible - [x] Synced alloy-k8s and grafana-config via ArgoCD - [x] Verified power metrics appearing in Prometheus - [x] Verified thermal collector errors stopped - [x] Verified logfmt parse errors silenced - [x] Verified devpi dashboard loads correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/45	2026-01-22 17:30:08 -08:00
Erich Blume	af39067e1f	Pin ArgoCD to v3.2.6 (#44 ) ## Summary - Pin ArgoCD kustomization to v3.2.6 tag instead of `stable` branch - This gives intentional control over ArgoCD version upgrades ## Deployment and Testing - [ ] Sync the `apps` application: `argocd app sync apps` - [ ] Point argocd at feature branch: `argocd app set argocd --revision feature/pin-argocd-v3.2.6` - [ ] Sync argocd: `argocd app sync argocd` - [ ] Verify ArgoCD is running v3.2.6 - [ ] After merge, reset to main: `argocd app set argocd --revision main && argocd app sync argocd` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/44	2026-01-22 16:38:27 -08:00
Erich Blume	e4a8405de7	Observability cleanup and k8s service monitoring (#43 ) (#43 ) ## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43	2026-01-22 13:51:01 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	5a829e0afd	Remove unused indri tags and ansible roles (#41 ) ## Summary - Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission - Also remove unused node_exporter and podman ansible roles - Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed) - Update indri description to reflect current architecture ## Changes Ansible roles removed (34 files, ~1000 lines): - devpi, devpi_metrics - kiwix - transmission, transmission_metrics - node_exporter - podman Pulumi indri tags removed: - tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed These services now run in k8s with their own Tailscale devices via tailscale-operator. ## Deployment and Testing - [x] Verified remaining ansible roles match indri.yml - [x] Verified no playbooks or role dependencies reference removed roles - [ ] Run `pulumi preview` to verify tag changes - [ ] Run `pulumi up` to apply tag changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41	2026-01-21 20:18:53 -08:00
Erich Blume	6a140107c6	P7 forgejo plan updated	2026-01-21 20:04:18 -08:00
Erich Blume	4dd74dfff8	complete P6	2026-01-21 19:16:04 -08:00
Erich Blume	2e7ca8a5ff	Add mise task to list unresolved PR comments (#40 ) ## Summary - New `pr-comments` mise task queries Forge API for unresolved review comments on a PR - Task takes a PR number as argument and displays all comments without a resolver - Updated CLAUDE.md to include using this task after user reviews PRs ## Deployment and Testing - [x] Tested task on PR #39 (shows no unresolved comments since all were resolved) - [x] Tested error handling with non-existent PR #9999 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/40	2026-01-21 19:14:27 -08:00
Erich Blume	7ec98210a9	P6: Migrate Kiwix and Transmission to Kubernetes (#39 ) ## Summary - Add Transmission BitTorrent daemon to k8s (torrent namespace) - Add Kiwix ZIM archive server to k8s (kiwix namespace) - NFS storage from sifaka for shared torrent/ZIM data - Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list - ZIM-watcher CronJob to auto-restart kiwix when new archives appear - Remove transmission, transmission_metrics, and kiwix ansible roles from indri - Remove svc:kiwix from tailscale_serve defaults ## Key Decisions - Direct NFS mount for kiwix (no PVC) since it shares storage with transmission - Shell wrapper for kiwix-serve command (glob expansion) - Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism) - Completed downloads stored in `/downloads/complete/` on sifaka ## Deployment and Testing - [x] Deployed transmission to k8s - [x] Verified transmission web UI at torrent.tail8d86e.ts.net - [x] Moved existing ZIM files to complete folder - [x] Deployed kiwix to k8s - [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net - [x] Stopped old services on indri - [x] Cleared svc:kiwix from Tailscale serve on indri - [x] Updated zk documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39	2026-01-21 18:07:40 -08:00

... 17 18 19 20 21

1,033 commits