blumeops

Author	SHA1	Message	Date
Erich Blume	343d066701	Simplify runner image and workflows (Dagger Phase 3) Remove Node.js, Docker CLI, buildx, skopeo, gnupg, lsb-release, and xz-utils from the job execution image — all build tools now live inside Dagger containers. Add tzdata (for TZ env var support) and flyctl. Remove "Ensure Dagger CLI" bootstrap steps from both workflows and the "Install flyctl" step from build-blumeops. Set TZ=America/Los_Angeles in the runner configmap so all job containers inherit it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 17:23:37 -08:00
Forgejo Actions	996afbcf6f	Update docs release to v1.6.5 [skip ci]	2026-02-11 17:10:29 -08:00
Forgejo Actions	6ce03df819	Update docs release to v1.6.4 - Built changelog from towncrier fragments [skip ci]	2026-02-12 01:01:23 +00:00
Erich Blume	2a04ab26b7	Mount host zoneinfo into runner for TZ support (#160 ) ## Summary The `TZ=America/Los_Angeles` env var from #159 has no effect because the `forgejo/runner` image doesn't ship tzdata. Mount the node's `/usr/share/zoneinfo` into the container so the timezone database is available. ## Deployment After merge, sync forgejo-runner and verify: ``` argocd app sync forgejo-runner kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date # Should show PST/PDT, not UTC ``` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/160	2026-02-11 16:57:11 -08:00
Erich Blume	42ebc2b122	Fix Forgejo runner timezone (UTC -> America/Los_Angeles) (#159 ) ## Summary - Set `TZ=America/Los_Angeles` on the Forgejo runner container The runner pod defaults to UTC. When releases are cut in the evening PST, towncrier stamps changelog entries with tomorrow's date (e.g., v1.6.2 shows 2026-02-12 despite being released on the evening of Feb 11 PST). ## Deployment After merge, sync the forgejo-runner ArgoCD app: ``` argocd app sync forgejo-runner ``` The runner pod will restart with the new timezone. Note: the v1.6.2 changelog entry will remain dated 2026-02-12; future entries will use PST dates, so dates may appear non-sequential once. Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/159	2026-02-11 16:53:41 -08:00
Forgejo Actions	e5d1e795e0	Update docs release to v1.6.3 [skip ci]	2026-02-12 00:46:35 +00:00
Forgejo Actions	a75089d8ef	Update docs release to v1.6.2 - Built changelog from towncrier fragments [skip ci]	2026-02-12 00:35:02 +00:00
Erich Blume	1bc2b421a8	Adopt Dagger CI for container builds (Phase 1) (#156 ) All checks were successful Build Container / build (push) Successful in 13s Details ## Summary - Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images - Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml` - BuildKit's native push is compatible with Zot — skopeo workaround eliminated - Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0 - Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner) - Delete old `.forgejo/actions/build-push-image/` composite action - Add GPLv3 LICENSE ## Verified locally - `dagger call build --src=. --container-name=nettest` — builds ✓ - `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓ - `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓ - Dagger CLI accessible inside built runner image ✓ ## Deployment sequence (after merge) 1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner 2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in) 3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline 4. `mise run container-list` — verify tags ## Not included (future phases) - Phase 2: docs build + Forgejo packages migration - Phase 3: runner simplification (remove skopeo, Node.js, etc.) - Phase 4: future workflows Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156	2026-02-11 15:38:31 -08:00
Forgejo Actions	362ae22ab7	Update docs release to v1.6.1 - Built changelog from towncrier fragments [skip ci]	2026-02-11 21:37:34 +00:00
Forgejo Actions	eca01a9546	Update docs release to v1.6.0 - Built changelog from towncrier fragments [skip ci]	2026-02-11 21:33:57 +00:00
Forgejo Actions	ab6661f5dd	Update docs release to v1.5.4 - Built changelog from towncrier fragments [skip ci]	2026-02-11 20:17:12 +00:00
Forgejo Actions	a106f92c38	Update docs release to v1.5.3 - Built changelog from towncrier fragments [skip ci]	2026-02-11 15:53:49 +00:00
Erich Blume	f0ac04fb8a	Bootstrap buildx: revert to docker build, bump runner to v2.5.1 (#148 ) All checks were successful Build Container / build (push) Successful in 1m56s Details ## Summary - Temporarily revert composite action to `docker build` so we can build the runner image (chicken-and-egg: current runner v2.5.0 doesn't have buildx) - Bump runner label to `v2.5.1` so after sync the new runner image (with buildx) gets used ## Deployment plan 1. Merge this PR 2. Tag `forgejo-runner-v2.5.1` — builds with legacy `docker build` (one last time) 3. Sync forgejo-runner in ArgoCD to pick up the v2.5.1 label 4. Follow-up PR: switch action back to `docker buildx build` 5. Tag `nettest-v0.12.0` to verify buildx works end-to-end Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/148	2026-02-10 21:17:14 -08:00
Erich Blume	85e36cd807	Operations and observability for sifaka NAS (#135 ) ## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135	2026-02-09 17:44:05 -08:00
Erich Blume	3415cad38c	Log real client IPs via Fly-Client-IP header (#130 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 59s Details ## Summary - Add `client_ip` field to the Fly.io nginx JSON log format, sourced from `Fly-Client-IP` header - Extract `client_ip` in the Alloy pipeline so it's available as a parsed field in Loki - Keeps `remote_addr` (the internal proxy IP) for debugging Fixes: Grafana access logs for docs.eblu.me showing 172.16.11.178 for every request instead of real visitor IPs. ## Deployment and Testing - [ ] Deploy updated fly.io proxy: `fly deploy` from `fly/` directory - [ ] Verify in Grafana that new log lines include `client_ip` with real IPs - [ ] Confirm `remote_addr` still shows the proxy IP (preserved for debugging) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/130	2026-02-09 11:02:06 -08:00
Forgejo Actions	92a1081302	Update docs release to v1.5.2 - Built changelog from towncrier fragments [skip ci]	2026-02-09 15:30:21 +00:00
Erich Blume	a0b076172f	Fix Immich/Homepage Ingress host matching, add missing service checks (#127 ) ## Summary - Fix Immich Ingress `host: photos` causing 404 with ProxyGroup (same FQDN mismatch as Prometheus/Loki) - Migrate Homepage from old per-service Tailscale proxy to shared ProxyGroup (was the last holdout) - Add Immich and Navidrome to `services-check` HTTP endpoints ## Deployment Notes - Already tested on branch: Immich and Homepage both return 200 via Caddy - Homepage's old Helm-managed Ingress was deleted manually; ArgoCD may recreate it on sync — prune with `argocd app sync homepage --prune` after merge - Old per-service `ts-homepage-*` pod in tailscale namespace can be cleaned up after confirming ProxyGroup works ## Test Plan - [x] `curl https://photos.ops.eblu.me/` returns 200 - [x] `curl https://go.ops.eblu.me/` returns 200 - [ ] `mise run services-check` fully passes after merge Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/127	2026-02-08 22:12:50 -08:00
Erich Blume	e6cf7e47e0	Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m8s Details ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126	2026-02-08 21:54:18 -08:00
Forgejo Actions	c8d0af6644	Update docs release to v1.5.1 - Built changelog from towncrier fragments [skip ci]	2026-02-08 18:06:46 +00:00
Erich Blume	cc54b4f565	Add Fly.io proxy observability via embedded Alloy (#123 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m16s Details ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: Docs APM (per-service view filtered by `host="docs.eblu.me"`) and Fly.io Proxy Health (aggregate proxy health across all upstream services) ## Changed Files \| File \| Change \| \|------\|--------\| \| `fly/nginx.conf` \| Add JSON `log_format` + `access_log`, add `stub_status` endpoint \| \| `fly/Dockerfile` \| COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config \| \| `fly/alloy.river` \| New — Alloy config: log tailing, metric extraction, remote_write \| \| `fly/start.sh` \| Start Alloy after Tailscale, before nginx \| \| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` \| New — Docs APM dashboard \| \| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` \| New — Fly.io Proxy Health dashboard \| \| `argocd/manifests/grafana-config/kustomization.yaml` \| Register new dashboard configmaps \| \| `docs/reference/services/flyio-proxy.md` \| Document observability setup \| ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123	2026-02-08 10:05:38 -08:00
Forgejo Actions	c46d55060d	Update docs release to v1.5.0 - Built changelog from towncrier fragments [skip ci]	2026-02-08 10:37:30 +00:00
Erich Blume	64a78422b1	Add Fly.io public reverse proxy for docs.eblu.me (#120 ) Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 9s Details ## Summary - Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale - First service exposed: `docs.eblu.me` — the Quartz static docs site - Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME - Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow ## Key details - Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed - Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts - nginx caches aggressively for the static site; health check is on the default_server block - ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only - DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev` ## Test plan - [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok` - [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status` - [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert - [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected) - [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120	2026-02-08 02:36:19 -08:00
Forgejo Actions	11c76d4768	Update docs release to v1.4.2 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:45:40 +00:00
Forgejo Actions	ab7efd8c1c	Update docs release to v1.4.1 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:27:23 +00:00
Forgejo Actions	3f5017f732	Update docs release to v1.4.0 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:03:34 +00:00
Erich Blume	3b4ff91469	Fix homepage Admin bookmark icons (#110 ) ## Summary - Fix broken Pulumi icon: changed `pulumi` to `si-pulumi` (Simple Icons prefix required) - Fix broken ArgoCD icon: changed `argocd` to `argo-cd` (Dashboard Icons uses hyphenated name) ## Deployment and Testing - [ ] Sync homepage app in ArgoCD - [ ] Verify icons appear on go.ops.eblu.me Admin bookmarks section Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/110	2026-02-05 06:29:39 -08:00
Forgejo Actions	808bc507d8	Update docs release to v1.3.4 - Built changelog from towncrier fragments [skip ci]	2026-02-05 01:22:10 +00:00
Forgejo Actions	a03a9faaad	Update docs release to v1.3.3 - Built changelog from towncrier fragments [skip ci]	2026-02-04 22:40:18 +00:00
Forgejo Actions	e15caec898	Update docs release to v1.3.2 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:47:27 +00:00
Forgejo Actions	4aeade1543	Update docs release to v1.3.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:26:24 +00:00
Forgejo Actions	1835e3e80e	Update docs release to v1.3.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:14:08 +00:00
Erich Blume	1e13d4b83d	Fix Navidrome automatic library scan schedule (#101 ) ## Summary - Fix env var name from `ND_SCANSCHEDULE` to `ND_SCANNER_SCHEDULE` (Navidrome uses viper config where dots become underscores) - Use explicit `@every 1h` format for clarity - Reorder CLAUDE.md rules to emphasize running zk-docs first ## Root Cause Navidrome logs showed "Periodic scan is DISABLED" at startup despite the env var being set. The config key is `scanner.schedule`, which translates to `ND_SCANNER_SCHEDULE` (not `ND_SCANSCHEDULE`). ## Deployment and Testing - [ ] Sync navidrome app: `argocd app sync navidrome` - [ ] Verify pod restarts with new env var - [ ] Check logs for "Scheduling scanner" message instead of "Periodic scan is DISABLED" - [ ] Wait ~1 hour and confirm scan runs automatically 🤖 Generated with [Claude Code](https://claude.ai/code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/101	2026-02-04 07:23:12 -08:00
Forgejo Actions	e405a48881	Update docs release to v1.2.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 05:18:37 +00:00
Forgejo Actions	f88da51e23	Update docs release to v1.2.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:53:30 +00:00
Forgejo Actions	16cdffaebf	Update docs release to v1.1.5 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:34:31 +00:00
Forgejo Actions	e426473c59	Update docs release to v1.1.4 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:18:04 +00:00
Forgejo Actions	672dbda9d7	Update docs release to v1.1.3 - Built changelog from towncrier fragments [skip ci]	2026-02-04 03:07:15 +00:00
Forgejo Actions	f279891575	Update docs release to v1.1.2 - Built changelog from towncrier fragments [skip ci]	2026-02-04 03:02:13 +00:00
Forgejo Actions	81d99b689d	Update docs release to v1.1.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 02:53:17 +00:00
Forgejo Actions	bf03d71780	Update docs release to v1.1.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:27:09 +00:00
Erich Blume	82bcd935cd	Move DOCS_RELEASE_URL from ConfigMap to Deployment This ensures ArgoCD sync triggers a pod rollout when the URL changes, since ConfigMap data changes don't restart pods automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:23:52 -08:00
Forgejo Actions	103cc0deab	Update docs release to v1.0.14 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:18:33 +00:00
Erich Blume	aaf5090509	Remove ARGOCD_AUTH_TOKEN from external secret Workflow secrets come from Forgejo's secret store, not runner env. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:17:53 -08:00
Forgejo Actions	492aa9a104	Update docs release to v1.0.13 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:15:22 +00:00
Erich Blume	3a26d7e49a	Update forgejo-runner image to v2.5.0 Fixes argocd CLI download. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:13:37 -08:00
Forgejo Actions	4d3222d91b	Update docs release to v1.0.12 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:07:05 +00:00
Erich Blume	f08595a3c0	Update forgejo-runner image to v2.4.0 Includes uv and argocd CLI for auto-deploy workflow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:05:09 -08:00
Erich Blume	1f73eb675d	Auto-deploy docs from build workflow (#93 ) ## Summary - Add `uv` and `argocd` CLI to forgejo-runner container image - Add `workflow-bot` ArgoCD account with sync permissions (declarative via kustomize patches) - Add `ARGOCD_AUTH_TOKEN` to forgejo-runner external secret for workflow auth - Update build workflow to auto-deploy docs after release: - Update configmap with new release URL - Commit changelog and configmap changes - Sync docs app via ArgoCD ## Deployment and Testing Manual steps required before this can work: 1. [ ] Build and push new forgejo-runner image (v2.4.0) 2. [ ] Sync argocd app to create workflow-bot account 3. [ ] Generate token: `argocd account generate-token --account workflow-bot` 4. [ ] Store token in 1Password under "Forgejo Secrets" with field `argocd_token` 5. [ ] Sync forgejo-runner app to pick up new external secret 6. [ ] Update forgejo-runner deployment to use new image version 7. [ ] Test by running workflow manually 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/93	2026-02-03 16:58:03 -08:00
Erich Blume	7d5e6b032b	Update docs release to v1.0.11 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 16:40:06 -08:00
Erich Blume	31564d1d9a	Update docs release to v1.0.10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 16:32:17 -08:00

1 2 3

115 commits