blumeops

Author	SHA1	Message	Date
Forgejo Actions	996afbcf6f	Update docs release to v1.6.5 [skip ci]	2026-02-11 17:10:29 -08:00
Forgejo Actions	6ce03df819	Update docs release to v1.6.4 - Built changelog from towncrier fragments [skip ci]	2026-02-12 01:01:23 +00:00
Erich Blume	2a04ab26b7	Mount host zoneinfo into runner for TZ support (#160 ) ## Summary The `TZ=America/Los_Angeles` env var from #159 has no effect because the `forgejo/runner` image doesn't ship tzdata. Mount the node's `/usr/share/zoneinfo` into the container so the timezone database is available. ## Deployment After merge, sync forgejo-runner and verify: ``` argocd app sync forgejo-runner kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date # Should show PST/PDT, not UTC ``` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/160	2026-02-11 16:57:11 -08:00
Erich Blume	42ebc2b122	Fix Forgejo runner timezone (UTC -> America/Los_Angeles) (#159 ) ## Summary - Set `TZ=America/Los_Angeles` on the Forgejo runner container The runner pod defaults to UTC. When releases are cut in the evening PST, towncrier stamps changelog entries with tomorrow's date (e.g., v1.6.2 shows 2026-02-12 despite being released on the evening of Feb 11 PST). ## Deployment After merge, sync the forgejo-runner ArgoCD app: ``` argocd app sync forgejo-runner ``` The runner pod will restart with the new timezone. Note: the v1.6.2 changelog entry will remain dated 2026-02-12; future entries will use PST dates, so dates may appear non-sequential once. Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/159	2026-02-11 16:53:41 -08:00
Forgejo Actions	e5d1e795e0	Update docs release to v1.6.3 [skip ci]	2026-02-12 00:46:35 +00:00
Forgejo Actions	a75089d8ef	Update docs release to v1.6.2 - Built changelog from towncrier fragments [skip ci]	2026-02-12 00:35:02 +00:00
Erich Blume	1bc2b421a8	Adopt Dagger CI for container builds (Phase 1) (#156 ) All checks were successful Build Container / build (push) Successful in 13s Details ## Summary - Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images - Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml` - BuildKit's native push is compatible with Zot — skopeo workaround eliminated - Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0 - Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner) - Delete old `.forgejo/actions/build-push-image/` composite action - Add GPLv3 LICENSE ## Verified locally - `dagger call build --src=. --container-name=nettest` — builds ✓ - `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓ - `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓ - Dagger CLI accessible inside built runner image ✓ ## Deployment sequence (after merge) 1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner 2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in) 3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline 4. `mise run container-list` — verify tags ## Not included (future phases) - Phase 2: docs build + Forgejo packages migration - Phase 3: runner simplification (remove skopeo, Node.js, etc.) - Phase 4: future workflows Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156	2026-02-11 15:38:31 -08:00
Forgejo Actions	362ae22ab7	Update docs release to v1.6.1 - Built changelog from towncrier fragments [skip ci]	2026-02-11 21:37:34 +00:00
Forgejo Actions	eca01a9546	Update docs release to v1.6.0 - Built changelog from towncrier fragments [skip ci]	2026-02-11 21:33:57 +00:00
Forgejo Actions	ab6661f5dd	Update docs release to v1.5.4 - Built changelog from towncrier fragments [skip ci]	2026-02-11 20:17:12 +00:00
Forgejo Actions	a106f92c38	Update docs release to v1.5.3 - Built changelog from towncrier fragments [skip ci]	2026-02-11 15:53:49 +00:00
Erich Blume	f0ac04fb8a	Bootstrap buildx: revert to docker build, bump runner to v2.5.1 (#148 ) All checks were successful Build Container / build (push) Successful in 1m56s Details ## Summary - Temporarily revert composite action to `docker build` so we can build the runner image (chicken-and-egg: current runner v2.5.0 doesn't have buildx) - Bump runner label to `v2.5.1` so after sync the new runner image (with buildx) gets used ## Deployment plan 1. Merge this PR 2. Tag `forgejo-runner-v2.5.1` — builds with legacy `docker build` (one last time) 3. Sync forgejo-runner in ArgoCD to pick up the v2.5.1 label 4. Follow-up PR: switch action back to `docker buildx build` 5. Tag `nettest-v0.12.0` to verify buildx works end-to-end Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/148	2026-02-10 21:17:14 -08:00
Erich Blume	85e36cd807	Operations and observability for sifaka NAS (#135 ) ## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135	2026-02-09 17:44:05 -08:00
Erich Blume	3415cad38c	Log real client IPs via Fly-Client-IP header (#130 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 59s Details ## Summary - Add `client_ip` field to the Fly.io nginx JSON log format, sourced from `Fly-Client-IP` header - Extract `client_ip` in the Alloy pipeline so it's available as a parsed field in Loki - Keeps `remote_addr` (the internal proxy IP) for debugging Fixes: Grafana access logs for docs.eblu.me showing 172.16.11.178 for every request instead of real visitor IPs. ## Deployment and Testing - [ ] Deploy updated fly.io proxy: `fly deploy` from `fly/` directory - [ ] Verify in Grafana that new log lines include `client_ip` with real IPs - [ ] Confirm `remote_addr` still shows the proxy IP (preserved for debugging) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/130	2026-02-09 11:02:06 -08:00
Forgejo Actions	92a1081302	Update docs release to v1.5.2 - Built changelog from towncrier fragments [skip ci]	2026-02-09 15:30:21 +00:00
Erich Blume	a0b076172f	Fix Immich/Homepage Ingress host matching, add missing service checks (#127 ) ## Summary - Fix Immich Ingress `host: photos` causing 404 with ProxyGroup (same FQDN mismatch as Prometheus/Loki) - Migrate Homepage from old per-service Tailscale proxy to shared ProxyGroup (was the last holdout) - Add Immich and Navidrome to `services-check` HTTP endpoints ## Deployment Notes - Already tested on branch: Immich and Homepage both return 200 via Caddy - Homepage's old Helm-managed Ingress was deleted manually; ArgoCD may recreate it on sync — prune with `argocd app sync homepage --prune` after merge - Old per-service `ts-homepage-*` pod in tailscale namespace can be cleaned up after confirming ProxyGroup works ## Test Plan - [x] `curl https://photos.ops.eblu.me/` returns 200 - [x] `curl https://go.ops.eblu.me/` returns 200 - [ ] `mise run services-check` fully passes after merge Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/127	2026-02-08 22:12:50 -08:00
Erich Blume	e6cf7e47e0	Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m8s Details ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126	2026-02-08 21:54:18 -08:00
Forgejo Actions	c8d0af6644	Update docs release to v1.5.1 - Built changelog from towncrier fragments [skip ci]	2026-02-08 18:06:46 +00:00
Erich Blume	cc54b4f565	Add Fly.io proxy observability via embedded Alloy (#123 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m16s Details ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: Docs APM (per-service view filtered by `host="docs.eblu.me"`) and Fly.io Proxy Health (aggregate proxy health across all upstream services) ## Changed Files \| File \| Change \| \|------\|--------\| \| `fly/nginx.conf` \| Add JSON `log_format` + `access_log`, add `stub_status` endpoint \| \| `fly/Dockerfile` \| COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config \| \| `fly/alloy.river` \| New — Alloy config: log tailing, metric extraction, remote_write \| \| `fly/start.sh` \| Start Alloy after Tailscale, before nginx \| \| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` \| New — Docs APM dashboard \| \| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` \| New — Fly.io Proxy Health dashboard \| \| `argocd/manifests/grafana-config/kustomization.yaml` \| Register new dashboard configmaps \| \| `docs/reference/services/flyio-proxy.md` \| Document observability setup \| ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123	2026-02-08 10:05:38 -08:00
Forgejo Actions	c46d55060d	Update docs release to v1.5.0 - Built changelog from towncrier fragments [skip ci]	2026-02-08 10:37:30 +00:00
Erich Blume	64a78422b1	Add Fly.io public reverse proxy for docs.eblu.me (#120 ) Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 9s Details ## Summary - Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale - First service exposed: `docs.eblu.me` — the Quartz static docs site - Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME - Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow ## Key details - Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed - Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts - nginx caches aggressively for the static site; health check is on the default_server block - ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only - DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev` ## Test plan - [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok` - [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status` - [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert - [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected) - [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120	2026-02-08 02:36:19 -08:00
Forgejo Actions	11c76d4768	Update docs release to v1.4.2 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:45:40 +00:00
Forgejo Actions	ab7efd8c1c	Update docs release to v1.4.1 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:27:23 +00:00
Forgejo Actions	3f5017f732	Update docs release to v1.4.0 - Built changelog from towncrier fragments [skip ci]	2026-02-08 05:03:34 +00:00
Erich Blume	3b4ff91469	Fix homepage Admin bookmark icons (#110 ) ## Summary - Fix broken Pulumi icon: changed `pulumi` to `si-pulumi` (Simple Icons prefix required) - Fix broken ArgoCD icon: changed `argocd` to `argo-cd` (Dashboard Icons uses hyphenated name) ## Deployment and Testing - [ ] Sync homepage app in ArgoCD - [ ] Verify icons appear on go.ops.eblu.me Admin bookmarks section Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/110	2026-02-05 06:29:39 -08:00
Forgejo Actions	808bc507d8	Update docs release to v1.3.4 - Built changelog from towncrier fragments [skip ci]	2026-02-05 01:22:10 +00:00
Forgejo Actions	a03a9faaad	Update docs release to v1.3.3 - Built changelog from towncrier fragments [skip ci]	2026-02-04 22:40:18 +00:00
Forgejo Actions	e15caec898	Update docs release to v1.3.2 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:47:27 +00:00
Forgejo Actions	4aeade1543	Update docs release to v1.3.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:26:24 +00:00
Forgejo Actions	1835e3e80e	Update docs release to v1.3.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 16:14:08 +00:00
Erich Blume	1e13d4b83d	Fix Navidrome automatic library scan schedule (#101 ) ## Summary - Fix env var name from `ND_SCANSCHEDULE` to `ND_SCANNER_SCHEDULE` (Navidrome uses viper config where dots become underscores) - Use explicit `@every 1h` format for clarity - Reorder CLAUDE.md rules to emphasize running zk-docs first ## Root Cause Navidrome logs showed "Periodic scan is DISABLED" at startup despite the env var being set. The config key is `scanner.schedule`, which translates to `ND_SCANNER_SCHEDULE` (not `ND_SCANSCHEDULE`). ## Deployment and Testing - [ ] Sync navidrome app: `argocd app sync navidrome` - [ ] Verify pod restarts with new env var - [ ] Check logs for "Scheduling scanner" message instead of "Periodic scan is DISABLED" - [ ] Wait ~1 hour and confirm scan runs automatically 🤖 Generated with [Claude Code](https://claude.ai/code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/101	2026-02-04 07:23:12 -08:00
Forgejo Actions	e405a48881	Update docs release to v1.2.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 05:18:37 +00:00
Forgejo Actions	f88da51e23	Update docs release to v1.2.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:53:30 +00:00
Forgejo Actions	16cdffaebf	Update docs release to v1.1.5 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:34:31 +00:00
Forgejo Actions	e426473c59	Update docs release to v1.1.4 - Built changelog from towncrier fragments [skip ci]	2026-02-04 04:18:04 +00:00
Forgejo Actions	672dbda9d7	Update docs release to v1.1.3 - Built changelog from towncrier fragments [skip ci]	2026-02-04 03:07:15 +00:00
Forgejo Actions	f279891575	Update docs release to v1.1.2 - Built changelog from towncrier fragments [skip ci]	2026-02-04 03:02:13 +00:00
Forgejo Actions	81d99b689d	Update docs release to v1.1.1 - Built changelog from towncrier fragments [skip ci]	2026-02-04 02:53:17 +00:00
Forgejo Actions	bf03d71780	Update docs release to v1.1.0 - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:27:09 +00:00
Erich Blume	82bcd935cd	Move DOCS_RELEASE_URL from ConfigMap to Deployment This ensures ArgoCD sync triggers a pod rollout when the URL changes, since ConfigMap data changes don't restart pods automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:23:52 -08:00
Forgejo Actions	103cc0deab	Update docs release to v1.0.14 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:18:33 +00:00
Erich Blume	aaf5090509	Remove ARGOCD_AUTH_TOKEN from external secret Workflow secrets come from Forgejo's secret store, not runner env. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:17:53 -08:00
Forgejo Actions	492aa9a104	Update docs release to v1.0.13 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:15:22 +00:00
Erich Blume	3a26d7e49a	Update forgejo-runner image to v2.5.0 Fixes argocd CLI download. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:13:37 -08:00
Forgejo Actions	4d3222d91b	Update docs release to v1.0.12 - Updated configmap with new DOCS_RELEASE_URL - Built changelog from towncrier fragments [skip ci]	2026-02-04 01:07:05 +00:00
Erich Blume	f08595a3c0	Update forgejo-runner image to v2.4.0 Includes uv and argocd CLI for auto-deploy workflow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:05:09 -08:00
Erich Blume	1f73eb675d	Auto-deploy docs from build workflow (#93 ) ## Summary - Add `uv` and `argocd` CLI to forgejo-runner container image - Add `workflow-bot` ArgoCD account with sync permissions (declarative via kustomize patches) - Add `ARGOCD_AUTH_TOKEN` to forgejo-runner external secret for workflow auth - Update build workflow to auto-deploy docs after release: - Update configmap with new release URL - Commit changelog and configmap changes - Sync docs app via ArgoCD ## Deployment and Testing Manual steps required before this can work: 1. [ ] Build and push new forgejo-runner image (v2.4.0) 2. [ ] Sync argocd app to create workflow-bot account 3. [ ] Generate token: `argocd account generate-token --account workflow-bot` 4. [ ] Store token in 1Password under "Forgejo Secrets" with field `argocd_token` 5. [ ] Sync forgejo-runner app to pick up new external secret 6. [ ] Update forgejo-runner deployment to use new image version 7. [ ] Test by running workflow manually 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/93	2026-02-03 16:58:03 -08:00
Erich Blume	7d5e6b032b	Update docs release to v1.0.11 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 16:40:06 -08:00
Erich Blume	31564d1d9a	Update docs release to v1.0.10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 16:32:17 -08:00
Erich Blume	d359583d0a	Update docs release to v1.0.9 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 16:23:02 -08:00

1 2 3

114 commits