blumeops

Author	SHA1	Message	Date
Erich Blume	ad968eea46	Remove tailscale_ci_gateway role and ACLs All checks were successful Test CI / test (pull_request) Successful in 4s Details The Docker-based runner with Tailscale sidecar approach was abandoned in favor of host execution mode. Clean up the unused infrastructure: - Remove tailscale_ci_gateway role and its reference in indri.yml - Remove tag:ci-gateway ACL grants and tagOwners from pulumi policy - Plist already removed from indri Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 13:26:36 -08:00
Erich Blume	cfe5c0c0dd	Switch forgejo-runner to host execution mode All checks were successful Test CI / test (pull_request) Successful in 4s Details Docker-based runner had networking issues reaching Forgejo from job containers. Host execution mode runs the runner daemon directly on indri, with jobs executing on the host. Actions that need Docker use host networking to access localhost:3001. - Runner binary compiled locally at ~/code/3rd/forgejo-runner - Labels use :host suffix instead of :docker://image - PATH set in launchd plist for mise-managed tools (node, etc.) - Container network set to "host" for actions needing Docker Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 13:23:39 -08:00
Erich Blume	c79dc94325	Fix forgejo-runner networking for tailnet access Some checks failed Test CI / test (pull_request) Failing after 32s Details - Add --accept-routes to tailscale-ci-gateway for service routing - Run forgejo-runner as root for docker socket access - Mount actual docker socket path (not symlink) - Use gateway network namespace for tailnet connectivity - Registration uses gateway network for Forgejo access Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:56:25 -08:00
Erich Blume	911913bb2e	Fix launchd templates to use full docker path launchd agents don't have /usr/local/bin in PATH by default. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:37:04 -08:00
Erich Blume	fdf5153130	Containerize forgejo-runner with Tailscale gateway for tailnet access Some checks failed Test CI / test (pull_request) Failing after 48s Details Architecture: - tailscale_ci_gateway role: Runs Tailscale container on tailnet-jobs network - forgejo_runner role: Runs runner daemon in container on same network - Job containers also use tailnet-jobs network This allows the runner and jobs to reach forge.tail8d86e.ts.net via the Tailscale gateway, avoiding hairpinning issues with localhost. Changes: - Add tailscale_ci_gateway role with launchd management - Refactor forgejo_runner to use containerized daemon - Runner registers with Tailscale URL instead of localhost - Job containers run on tailnet-jobs network - Update playbook role ordering (gateway before runner) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:28:35 -08:00
Erich Blume	018b44186f	Add tag:ci-gateway for Forgejo runner Tailscale sidecar - Add ci-gateway tag owner (admin and blumeops can assign) - Grant ci-gateway access to forge:443 for git operations - Grant ci-gateway access to registry:443 for container push/pull - Add ACL test for ci-gateway access Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:03:02 -08:00
Erich Blume	476b80e985	Use --add-host to map localhost to Docker host in job containers Some checks failed Test CI / test (pull_request) Failing after 40s Details This allows containers to reach Forgejo at localhost:3001 for git operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 10:40:35 -08:00
Erich Blume	15e3ec98ea	Use host networking for job containers Some checks failed Test CI / test (pull_request) Failing after 36s Details Containers need to reach localhost:3001 (Forgejo) for git operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 10:30:28 -08:00
Erich Blume	50b925791d	Update test workflow comment to trigger CI Some checks failed Test CI / test (pull_request) Failing after 1m15s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 10:28:12 -08:00
Erich Blume	35136e361e	Add comment to test workflow to trigger CI run Some checks failed Test CI / test (pull_request) Failing after 0s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 10:23:37 -08:00
Erich Blume	bcdee225e5	Replace k8s runner with ci-base image for local builds Some checks failed Test CI / test (pull_request) Failing after 1s Details - Remove forgejo-runner k8s manifests and ArgoCD app (runner now on indri) - Remove build-runner workflow (no longer needed) - Add ci-base image with Ubuntu 22.04 + common CI tools - Add build-ci-base workflow to build the image - Update test workflow to check docker instead of buildah - Document bootstrap vs production mode for runner labels - Configure host.docker.internal:5050 for zot access from job containers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 09:23:24 -08:00
Erich Blume	f4178fce7d	Add ubuntu-latest labels to indri runner Some checks failed Test CI / test (pull_request) Failing after 1s Details Now handles all workflows (test and build) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 08:59:13 -08:00
Erich Blume	6b4e0961ed	Add README explaining .github vs .forgejo directories All checks were successful Test CI / test (pull_request) Successful in 2s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 08:52:14 -08:00
Erich Blume	2c284ed0cf	Switch container builds to indri docker-builder runner Some checks failed Test CI / test (pull_request) Successful in 3s Details Build forgejo-runner / build (push) Failing after 0s Details runner-v1.0.5 - Use Docker instead of buildah in composite action - Build workflows now run on docker-builder label - Add actionlint config for custom runner labels - Avoids nested containerization complexity in k8s Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 08:49:39 -08:00
Erich Blume	8b75b696f0	Fix forgejo_runner handler (no nested blocks) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 08:44:23 -08:00
Erich Blume	7a637d2ebf	Fix 1Password field name for runner token All checks were successful Test CI / test (pull_request) Successful in 3s Details Use runner_reg field (matching existing k8s secret template) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:31:06 -08:00
Erich Blume	676c1782d1	Add forgejo_runner Ansible role for indri All checks were successful Test CI / test (pull_request) Successful in 2s Details Run forgejo-runner directly on indri using Docker container mode instead of trying to build containers inside k8s pods. This avoids nested containerization complexity. Features: - Build from source using mise + Go - Docker container mode for job isolation - Can build containers via Docker socket - Labels: docker-builder (distinct from k8s runner) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:28:44 -08:00
Erich Blume	8d2e180d5d	Add subuid/subgid for rootless buildah Some checks failed Test CI / test (pull_request) Successful in 3s Details Build forgejo-runner / build (push) Failing after 20s Details runner-v1.0.4 Buildah needs UID/GID remapping to extract images with files owned by different users (root, shadow, etc). Configure subordinate UID/GID ranges for the runner user. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:13:03 -08:00
Erich Blume	a979ddaf0c	Use versioned runner image v1.0.1 Some checks failed Test CI / test (pull_request) Successful in 3s Details Build forgejo-runner / build (push) Failing after 1m14s Details runner-v1.0.2 - Remove imagePullPolicy: Always (rely on immutable tags) - Use explicit version tag instead of :latest Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:07:06 -08:00
Erich Blume	4e0767b4d9	Build forgejo-runner from source with proper user setup Some checks failed Test CI / test (pull_request) Successful in 3s Details Build forgejo-runner / build (push) Failing after 2s Details runner-v1.0.1 - Multi-stage build from mirrored forgejo-runner source - Create proper runner user with passwd entry (fixes buildah) - Use named user instead of numeric UID Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:00:19 -08:00
Erich Blume	0c1a3bf0cf	Remove test comment from Dockerfile Some checks failed Test CI / test (pull_request) Successful in 2s Details Build forgejo-runner / build (push) Failing after 2s Details runner-v1.0.0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 21:41:01 -08:00
Erich Blume	3702e7eec2	Add tag-based container release workflow All checks were successful Test CI / test (pull_request) Successful in 3s Details - Workflows trigger on git tags (e.g. runner-v1.0.0, devpi-v1.0.0) - Composite action takes explicit version, tags image with version + SHA - Add mise-tasks/container-list to enumerate containers and recent tags - Add mise-tasks/container-release to create release tags - Update CLAUDE.md with container release commands - TODO: investigate zot tag immutability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 21:34:33 -08:00
Erich Blume	b2967817d6	Add comment to test buildah workflow All checks were successful Test CI / test (pull_request) Successful in 3s Details	2026-01-23 21:15:49 -08:00
Erich Blume	a3a61146a3	Fix SIGPIPE in test workflow by adding \|\| true to piped commands All checks were successful Test CI / test (pull_request) Successful in 3s Details	2026-01-23 21:14:02 -08:00
Erich Blume	6d8e6ea4c0	Update test workflow to verify buildah/podman instead of docker Some checks failed Test CI / test (pull_request) Failing after 12s Details	2026-01-23 21:05:40 -08:00
Erich Blume	c2be742094	Add imagePullPolicy: Always to ensure fresh image pulls	2026-01-23 21:03:53 -08:00
Erich Blume	9f5dae5707	Switch to Buildah for container builds (no Docker socket needed) - Replace docker-cli with buildah/podman in runner image - Configure buildah for overlay storage with fuse-overlayfs - Add registry config for insecure local registry - Remove Docker socket mount and root security context from deployment - Update composite action to use buildah bud/push instead of docker Buildah is daemonless - no Docker socket required, cleaner security model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 20:14:03 -08:00
Erich Blume	4c249ff116	Add docker group (GID 999) to runner security context	2026-01-23 19:44:43 -08:00
Erich Blume	4a3219648d	Add container build workflows with composite action - Create composite action: .forgejo/actions/build-push-image - Add build-runner.yaml workflow (triggers on Dockerfile changes) - Add build-devpi.yaml workflow (triggers on Dockerfile/start.sh changes) - Mount Docker socket in runner deployment for container builds - Run runner as root to access Docker socket Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 19:42:47 -08:00
Erich Blume	5fcd122494	Reorganize CI/CD bootstrap phases and add custom runner Dockerfile (#50 ) All checks were successful Test CI / test (push) Successful in 2s Details ## Summary - Reorder CI/CD bootstrap phases to address chicken-and-egg problem - P2 is now "Custom Runner Image" (stock runner lacks Node.js) - Add P3 for "Mirror Forgejo & Build from Source" - Rename P3 -> P4 (Self-Deploy), P4 -> P5 (Container Builds) - Add Dockerfile for custom runner with Node.js, npm, docker, build tools - Update overview with new phase structure, host mode notes, and cross-compilation challenge ## Key Changes ### Phase Reordering \| Old \| New \| Name \| \|-----\|-----\|------\| \| P1 \| P1 \| Enable Actions (complete) \| \| P2 \| P2 \| Custom Runner Image (new focus) \| \| - \| P3 \| Mirror Forgejo & Build (new) \| \| P3 \| P4 \| Self-Deploy \| \| P4 \| P5 \| Container Builds \| ### Custom Runner Dockerfile The stock `forgejo/runner:3.5.1` image lacks Node.js, so `actions/checkout@v4` doesn't work. The new Dockerfile adds: - Node.js + npm (for GitHub Actions) - Docker CLI (for container builds) - Build tools (gcc, make, curl, jq) ### Bootstrap Strategy 1. Build custom runner image manually on gilbert (podman build) 2. Push to zot registry 3. Update deployment to use custom image 4. Then enable auto-build workflow for runner ## Deployment and Testing - [x] Review plan changes - [x] Build custom runner image manually and verify - [x] Update runner deployment - [x] Test `actions/checkout@v4` works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/50	2026-01-23 18:50:27 -08:00
Erich Blume	3bcad4189f	Add actionlint pre-commit hook for workflow validation (#49 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Fix workflow to use `github.` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.` aliases) - Pass untrusted inputs through environment variables (security best practice per actionlint) - Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally ## Deployment and Testing - [x] Pre-commit hooks all pass - [x] actionlint validates `.forgejo/workflows/test.yaml` successfully - [ ] Verify workflow runs without errors on Forge after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49	2026-01-23 17:56:24 -08:00
Erich Blume	6a436d141a	Update CI/CD plan: mark Phase 1 complete, add runner observability All checks were successful Test CI / test (push) Successful in 0s Details - Mark Phase 1 (Enable Actions) as completed with date - Check off all verification items in P1 - Add Step 6 to Phase 4 for runner logging and metrics - Update overview table with status column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 17:10:14 -08:00
Erich Blume	7893c41020	Enable Forgejo Actions (Phase 1) (#48 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password - Enable Forgejo Actions in config (`[actions] ENABLED = true`) - Add `repo.actions` to DEFAULT_REPO_UNITS - Clean up unused MySQL database fields (we use SQLite) ## Phase 1 Progress This PR covers the first part of Phase 1 (ci-cd-bootstrap plan): - [x] Refactor app.ini to ansible template - [x] Store secrets in 1Password - [x] Enable Actions in config - [ ] Deploy config changes (pending review) - [ ] Create runner registration token - [ ] Deploy runner to k8s - [ ] Test with simple workflow ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags forgejo` to deploy - [ ] Verify Forgejo restarts correctly - [ ] Verify Actions tab appears in repo settings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48	2026-01-23 17:00:12 -08:00
Erich Blume	016f1043c8	Retire k8s-migration plan and create ci-cd-bootstrap plan	2026-01-23 14:13:01 -08:00
Erich Blume	25fa2ea665	Update indri-services-check	2026-01-22 21:31:11 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	11075d4517	Remove logfmt parsing stage from Alloy k8s config The stage.match selector wasn't preventing Alloy from logging decode errors internally. Removing logfmt parsing entirely - JSON parsing handles most structured logs, and plain text logs still get collected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 18:06:34 -08:00
Erich Blume	e6de7ba391	Fix Alloy logfmt decode errors for JSON logs (#46 ) ## Summary - Use `stage.match` to conditionally apply logfmt parsing only to lines that don't start with `{` - This prevents error spam like `"failed to decode logfmt" component_path=/ component_id=loki.process.pods component=stage type=logfmt err="logfmt syntax error at pos 2 on line 1: unexpected '\"'"` when JSON-formatted logs hit the logfmt parser ## Deployment and Testing - [ ] Sync alloy-k8s app to feature branch and verify errors stop appearing - [ ] Verify JSON logs are still parsed correctly - [ ] Verify logfmt logs (from Loki, Prometheus etc.) are still parsed correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/46	2026-01-22 18:00:34 -08:00
Erich Blume	16bfe06b7b	Fix LaunchDaemon check to use become: true LaunchDaemons run in the system domain and require sudo to query. Without become: true, the check always fails and tries to reload. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 17:34:23 -08:00
Erich Blume	57bf8512dc	Log filtering cleanup and observability improvements (#45 ) ## Summary - Suppress noisy storage-provisioner Endpoints deprecation warning (upstream minikube issue) - Disable thermal collector on indri Alloy (not supported on macOS M1) - Add macOS power/thermal metrics collection via powermetrics LaunchDaemon - Add Power & Thermal section to macOS Grafana dashboard - Add logfmt parser for k8s log level extraction (Loki, Prometheus, etc.) - Extract more fields from JSON logs (zot compatibility - uses "message" not "msg") - Silence logfmt parse errors for non-logfmt logs - Fix JSON escaping in devpi dashboard ## Deployment and Testing - [x] Deployed Alloy config changes to indri via ansible - [x] Synced alloy-k8s and grafana-config via ArgoCD - [x] Verified power metrics appearing in Prometheus - [x] Verified thermal collector errors stopped - [x] Verified logfmt parse errors silenced - [x] Verified devpi dashboard loads correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/45	2026-01-22 17:30:08 -08:00
Erich Blume	af39067e1f	Pin ArgoCD to v3.2.6 (#44 ) ## Summary - Pin ArgoCD kustomization to v3.2.6 tag instead of `stable` branch - This gives intentional control over ArgoCD version upgrades ## Deployment and Testing - [ ] Sync the `apps` application: `argocd app sync apps` - [ ] Point argocd at feature branch: `argocd app set argocd --revision feature/pin-argocd-v3.2.6` - [ ] Sync argocd: `argocd app sync argocd` - [ ] Verify ArgoCD is running v3.2.6 - [ ] After merge, reset to main: `argocd app set argocd --revision main && argocd app sync argocd` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/44	2026-01-22 16:38:27 -08:00
Erich Blume	e4a8405de7	Observability cleanup and k8s service monitoring (#43 ) (#43 ) ## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43	2026-01-22 13:51:01 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	5a829e0afd	Remove unused indri tags and ansible roles (#41 ) ## Summary - Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission - Also remove unused node_exporter and podman ansible roles - Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed) - Update indri description to reflect current architecture ## Changes Ansible roles removed (34 files, ~1000 lines): - devpi, devpi_metrics - kiwix - transmission, transmission_metrics - node_exporter - podman Pulumi indri tags removed: - tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed These services now run in k8s with their own Tailscale devices via tailscale-operator. ## Deployment and Testing - [x] Verified remaining ansible roles match indri.yml - [x] Verified no playbooks or role dependencies reference removed roles - [ ] Run `pulumi preview` to verify tag changes - [ ] Run `pulumi up` to apply tag changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41	2026-01-21 20:18:53 -08:00
Erich Blume	6a140107c6	P7 forgejo plan updated	2026-01-21 20:04:18 -08:00
Erich Blume	4dd74dfff8	complete P6	2026-01-21 19:16:04 -08:00
Erich Blume	2e7ca8a5ff	Add mise task to list unresolved PR comments (#40 ) ## Summary - New `pr-comments` mise task queries Forge API for unresolved review comments on a PR - Task takes a PR number as argument and displays all comments without a resolver - Updated CLAUDE.md to include using this task after user reviews PRs ## Deployment and Testing - [x] Tested task on PR #39 (shows no unresolved comments since all were resolved) - [x] Tested error handling with non-existent PR #9999 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/40	2026-01-21 19:14:27 -08:00
Erich Blume	7ec98210a9	P6: Migrate Kiwix and Transmission to Kubernetes (#39 ) ## Summary - Add Transmission BitTorrent daemon to k8s (torrent namespace) - Add Kiwix ZIM archive server to k8s (kiwix namespace) - NFS storage from sifaka for shared torrent/ZIM data - Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list - ZIM-watcher CronJob to auto-restart kiwix when new archives appear - Remove transmission, transmission_metrics, and kiwix ansible roles from indri - Remove svc:kiwix from tailscale_serve defaults ## Key Decisions - Direct NFS mount for kiwix (no PVC) since it shares storage with transmission - Shell wrapper for kiwix-serve command (glob expansion) - Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism) - Completed downloads stored in `/downloads/complete/` on sifaka ## Deployment and Testing - [x] Deployed transmission to k8s - [x] Verified transmission web UI at torrent.tail8d86e.ts.net - [x] Moved existing ZIM files to complete folder - [x] Deployed kiwix to k8s - [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net - [x] Stopped old services on indri - [x] Cleared svc:kiwix from Tailscale serve on indri - [x] Updated zk documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39	2026-01-21 18:07:40 -08:00
Erich Blume	89eff26301	complete P5.1	2026-01-21 16:32:01 -08:00
Erich Blume	21848a7919	P5.1: Migrate minikube from podman to QEMU2 driver (#38 ) ## Summary - Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support - Update ansible minikube role with qemu installation and containerd runtime - Remove podman role dependency from indri.yml - Add synology user creation steps and post-migration zot reconfiguration notes ## Why Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support. ## Deployment and Testing - [ ] Create k8s-storage user on Synology DSM - [ ] Store credentials in 1Password (synology-k8s-storage) - [ ] Export current k8s state - [ ] Stop and delete podman-based minikube cluster - [ ] Run ansible to create QEMU2 cluster - [ ] Test NFS volume mount with test pod - [ ] Redeploy ArgoCD and all apps - [ ] Verify all services healthy - [ ] Reconfigure zot registry mirrors for containerd (post-migration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38	2026-01-21 16:03:37 -08:00

1 2 3

131 commits