Commit graph

339 commits

Author SHA1 Message Date
fdd3f6483a Update forgejo-runner image to v3.2.0
All checks were successful
Build Container / build (push) Successful in 7m31s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:08:57 -08:00
Forgejo Actions
02b1397f1a Update docs release to v1.8.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-13 10:36:04 -08:00
0098ac37e0 Move non-secret runner env vars to deployment spec (#181)
## Summary
- Move FORGEJO_URL, RUNNER_NAME, and RUNNER_LABELS from ExternalSecret template to deployment env vars
- ExternalSecret now only contains the actual secret (RUNNER_TOKEN)
- Image version changes in RUNNER_LABELS now trigger automatic pod rollouts

## Deployment
1. Merge this PR
2. `argocd app sync forgejo-runner` — the deployment spec change will auto-roll the pod

No manual restart needed — that's the whole point :)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/181
2026-02-13 10:29:23 -08:00
52bbf88aa6 Update forgejo-runner image to v3.1.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:21:43 -08:00
4942dee182 Update homepage layout for new Content/Misc groups
Replace old Apps/Observability/Infrastructure layout entries with
Content and Misc to match the recategorized ingress annotations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:16:40 -08:00
ca6a845604 Move ArgoCD to Misc homepage group and rename ingress file
ArgoCD's tailscale ingress was missed in the recategorization (filed as
service-tailscale.yaml instead of ingress-tailscale.yaml). Fix the group
annotation and rename the file to match the convention used by all other
services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:13:32 -08:00
48ce5b4120 Recategorize homepage into Content and Misc groups (#179)
## Summary
- Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups
- **Content**: Immich, Kiwix, Miniflux, DJ, Grafana
- **Misc**: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI

## Deployment and Testing
- [ ] Sync affected ingresses via ArgoCD (all 11 services)
- [ ] Verify homepage shows the two new groups correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179
2026-02-13 09:09:22 -08:00
Forgejo Actions
e21277ae83 Update docs release to v1.8.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 19:20:27 -08:00
9c789a1868 Fix cache hit rate on APM and Fly.io dashboards (#177)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m19s
## Summary
- Remove `match_all = true` from `flyio_nginx_cache_requests_total` in Alloy so the metric only counts requests that go through the proxy cache (excludes health checks with empty `cache_status`)
- Change dashboard queries from `rate(...[5m])` to `increase(...[$__range])` — aggregates over the full dashboard time window instead of a 5-minute sliding window, giving meaningful ratios for low-traffic static sites
- Add null/NaN value mapping to show "No traffic" in neutral color instead of blank/red

## Root cause
Health check requests from Fly.io hit the default nginx server block (no `proxy_cache`), producing entries with empty `upstream_cache_status`. With `match_all = true`, these were counted in the cache metric, diluting the Fly.io dashboard ratio. For APM dashboards, `rate()[5m]` on low-traffic sites with 24h cache validity almost always returns either all-HITs (100%) or no data (blank → red background).

## Deployment
- Fly.io proxy redeploy needed for Alloy config change
- ArgoCD sync for dashboard ConfigMap changes

## Test plan
- [ ] Redeploy Fly.io proxy
- [ ] Sync grafana-config in ArgoCD
- [ ] Verify CV APM cache hit ratio shows a real percentage (not 100%)
- [ ] Verify Docs APM shows "No traffic" in neutral color when idle, real ratio when visited
- [ ] Verify Fly.io proxy dashboard cache ratio excludes health checks

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/177
2026-02-12 18:40:48 -08:00
9717863f65 Update CV release to v1.0.3, add X-Clacks-Overhead header (#176)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m5s
## Summary
- Update CV release URL from v1.0.2 to v1.0.3
- Add `X-Clacks-Overhead: GNU Terry Pratchett` header to both `docs.eblu.me` and `cv.eblu.me` server blocks in the Fly.io proxy nginx config

## Deployment and Testing
- [ ] Sync CV app: `argocd app sync cv`
- [ ] Verify CV is serving v1.0.3 content
- [ ] Deploy fly proxy (workflow or `mise run fly-deploy`)
- [ ] Verify header: `curl -sI https://docs.eblu.me | grep -i clacks`
- [ ] Verify header: `curl -sI https://cv.eblu.me | grep -i clacks`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/176
2026-02-12 17:08:22 -08:00
ed5c9c9b48 Update CV release to v1.0.2 (#175)
## Summary
- Update `CV_RELEASE_URL` in cv deployment from v1.0.1 to v1.0.2

## Deployment and Testing
- [ ] `argocd app sync cv` after merge
- [ ] Verify cv.eblu.me serves updated content

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/175
2026-02-12 16:18:55 -08:00
Forgejo Actions
70d8881959 Update docs release to v1.7.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 14:13:12 -08:00
7dc03c0af1 Add CV to services-check, update homepage link (#174)
## Summary
- Add CV to services-check (tailnet endpoint + public cv.eblu.me)
- Update CV homepage annotation to point to cv.eblu.me instead of cv.ops.eblu.me

## Deployment and Testing
- [ ] `argocd app sync cv` (homepage link change)
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/174
2026-02-12 14:10:03 -08:00
df372fccb6 Expose CV publicly at cv.eblu.me (#173)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m57s
## Summary
- Add nginx server block for `cv.eblu.me` (static site, same pattern as docs)
- Add DNS CNAME record in Pulumi (`cv.eblu.me` → `blumeops-proxy.fly.dev`)
- Add `cv.eblu.me` cert to `fly-setup` mise task
- Tag CV Tailscale ingress with `tag:flyio-target` for ACL access
- Remove `/_error` test endpoint from docs proxy

## Deployment and Testing
- [ ] `argocd app set cv --revision cv/public-cv-eblu-me && argocd app sync cv`
- [ ] `fly certs add cv.eblu.me -a blumeops-proxy`
- [ ] `mise run fly-deploy`
- [ ] Verify proxy: `curl -I -H "Host: cv.eblu.me" https://blumeops-proxy.fly.dev/`
- [ ] `mise run dns-preview` then `mise run dns-up`
- [ ] Verify live: `curl -I https://cv.eblu.me`
- [ ] Merge, then `argocd app set cv --revision main && argocd app sync cv`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/173
2026-02-12 14:05:00 -08:00
a68542a602 Update CV release to v1.0.1 (#172)
## Summary
- Update `CV_RELEASE_URL` from v0.1.0 to v1.0.1 in the CV deployment manifest

## Deployment and Testing
- [ ] `argocd app sync cv` after merge
- [ ] Verify cv.ops.eblu.me serves updated resume

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/172
2026-02-12 13:38:05 -08:00
Forgejo Actions
200be39492 Update docs release to v1.7.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 11:46:38 -08:00
01e19023ee Add CV/resume web app at cv.ops.eblu.me (#169)
## Summary
- nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz)
- ArgoCD app + k8s manifests (deployment, service, Tailscale ingress)
- Caddy route for `cv.ops.eblu.me`
- Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD
- Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv)

## Deployment steps (after merge)
1. `mise run container-tag-and-release cv v1.0.0`
2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`)
3. Run "Deploy CV" workflow in blumeops (default: latest)
4. `mise run provision-indri -- --tags caddy`
5. Verify at `https://cv.ops.eblu.me/`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169
2026-02-12 11:09:41 -08:00
Forgejo Actions
a800bdc8b9 Update docs release to v1.6.9
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:28:40 -08:00
Forgejo Actions
b36b30ef7a Update docs release to v1.6.8
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:12:50 -08:00
Forgejo Actions
0528a6f712 Update docs release to v1.6.7
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 18:07:11 -08:00
Forgejo Actions
9e0487b523 Update docs release to v1.6.6
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 17:57:59 -08:00
20a25557d6 Bump runner image to v3.0.2 (restore Docker CLI)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:53:04 -08:00
0006e6bf17 Bump runner image to v3.0.1 (restore Node.js)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:38:45 -08:00
3d84483513 Update runner job image to forgejo-runner:v3.0.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:27:50 -08:00
95364dcb48 Simplify runner image (Dagger Phase 3) (#162)
All checks were successful
Build Container / build (push) Successful in 1m13s
## Summary

With Phases 1 and 2 complete, the runner image no longer needs most of its bundled tools. This PR strips it down and adds what was missing.

**Removed** (now inside Dagger containers):
- Node.js 24.x
- Docker CLI + buildx plugin
- skopeo
- gnupg, lsb-release, xz-utils

**Added:**
- `tzdata` — fixes the TZ env var (#159, #160, #161) so `TZ=America/Los_Angeles` actually works
- `flyctl` — was being installed from scratch every release

**Workflow changes:**
- Remove "Ensure Dagger CLI" bootstrap steps from both workflows (Dagger is in the image)
- Remove "Install flyctl" step from build-blumeops (flyctl is in the image)
- Remove job-level `TZ` from build-blumeops (moved to runner configmap `runner.envs`)
- Set `TZ: America/Los_Angeles` in runner configmap so all job containers inherit it

## Deployment

After merge:
1. Build and release the new runner image: `mise run container-release forgejo-runner v2.0.0`
2. Sync the runner: `argocd app sync forgejo-runner`
3. Verify: `kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date` (but the real test is running a docs release and checking the changelog date)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/162
2026-02-11 17:24:20 -08:00
Forgejo Actions
996afbcf6f Update docs release to v1.6.5
[skip ci]
2026-02-11 17:10:29 -08:00
Forgejo Actions
6ce03df819 Update docs release to v1.6.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 01:01:23 +00:00
2a04ab26b7 Mount host zoneinfo into runner for TZ support (#160)
## Summary

The `TZ=America/Los_Angeles` env var from #159 has no effect because the `forgejo/runner` image doesn't ship tzdata. Mount the node's `/usr/share/zoneinfo` into the container so the timezone database is available.

## Deployment

After merge, sync forgejo-runner and verify:
```
argocd app sync forgejo-runner
kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date
# Should show PST/PDT, not UTC
```

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/160
2026-02-11 16:57:11 -08:00
42ebc2b122 Fix Forgejo runner timezone (UTC -> America/Los_Angeles) (#159)
## Summary

- Set `TZ=America/Los_Angeles` on the Forgejo runner container

The runner pod defaults to UTC. When releases are cut in the evening PST, towncrier stamps changelog entries with tomorrow's date (e.g., v1.6.2 shows 2026-02-12 despite being released on the evening of Feb 11 PST).

## Deployment

After merge, sync the forgejo-runner ArgoCD app:
```
argocd app sync forgejo-runner
```

The runner pod will restart with the new timezone. Note: the v1.6.2 changelog entry will remain dated 2026-02-12; future entries will use PST dates, so dates may appear non-sequential once.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/159
2026-02-11 16:53:41 -08:00
Forgejo Actions
e5d1e795e0 Update docs release to v1.6.3
[skip ci]
2026-02-12 00:46:35 +00:00
Forgejo Actions
a75089d8ef Update docs release to v1.6.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 00:35:02 +00:00
1bc2b421a8 Adopt Dagger CI for container builds (Phase 1) (#156)
All checks were successful
Build Container / build (push) Successful in 13s
## Summary

- Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images
- Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml`
- BuildKit's native push is compatible with Zot — **skopeo workaround eliminated**
- Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0
- Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner)
- Delete old `.forgejo/actions/build-push-image/` composite action
- Add GPLv3 LICENSE

## Verified locally

- `dagger call build --src=. --container-name=nettest` — builds ✓
- `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓
- `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓
- Dagger CLI accessible inside built runner image ✓

## Deployment sequence (after merge)

1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner
2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in)
3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline
4. `mise run container-list` — verify tags

## Not included (future phases)

- Phase 2: docs build + Forgejo packages migration
- Phase 3: runner simplification (remove skopeo, Node.js, etc.)
- Phase 4: future workflows

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156
2026-02-11 15:38:31 -08:00
Forgejo Actions
362ae22ab7 Update docs release to v1.6.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:37:34 +00:00
Forgejo Actions
eca01a9546 Update docs release to v1.6.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:33:57 +00:00
Forgejo Actions
ab6661f5dd Update docs release to v1.5.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 20:17:12 +00:00
Forgejo Actions
a106f92c38 Update docs release to v1.5.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 15:53:49 +00:00
f0ac04fb8a Bootstrap buildx: revert to docker build, bump runner to v2.5.1 (#148)
All checks were successful
Build Container / build (push) Successful in 1m56s
## Summary
- Temporarily revert composite action to `docker build` so we can build the runner image (chicken-and-egg: current runner v2.5.0 doesn't have buildx)
- Bump runner label to `v2.5.1` so after sync the new runner image (with buildx) gets used

## Deployment plan
1. Merge this PR
2. Tag `forgejo-runner-v2.5.1` — builds with legacy `docker build` (one last time)
3. Sync forgejo-runner in ArgoCD to pick up the v2.5.1 label
4. Follow-up PR: switch action back to `docker buildx build`
5. Tag `nettest-v0.12.0` to verify buildx works end-to-end

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/148
2026-02-10 21:17:14 -08:00
85e36cd807 Operations and observability for sifaka NAS (#135)
## Summary
- Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring
- Formalize existing `node_exporter` container under Ansible management
- Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus
- Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime)
- Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS
- Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles

## Prerequisites before deploy
- [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP)
- [ ] Verify `ssh eblume@sifaka 'docker ps'` works
- [ ] Run `mise run provision-sifaka` to deploy containers
- [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes
- [ ] `argocd app sync prometheus` + `argocd app sync grafana-config`

## Test plan
- [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics`
- [ ] Verify Prometheus targets page shows both sifaka jobs as UP
- [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135
2026-02-09 17:44:05 -08:00
3415cad38c Log real client IPs via Fly-Client-IP header (#130)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 59s
## Summary
- Add `client_ip` field to the Fly.io nginx JSON log format, sourced from `Fly-Client-IP` header
- Extract `client_ip` in the Alloy pipeline so it's available as a parsed field in Loki
- Keeps `remote_addr` (the internal proxy IP) for debugging

Fixes: Grafana access logs for docs.eblu.me showing 172.16.11.178 for every request instead of real visitor IPs.

## Deployment and Testing
- [ ] Deploy updated fly.io proxy: `fly deploy` from `fly/` directory
- [ ] Verify in Grafana that new log lines include `client_ip` with real IPs
- [ ] Confirm `remote_addr` still shows the proxy IP (preserved for debugging)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/130
2026-02-09 11:02:06 -08:00
Forgejo Actions
92a1081302 Update docs release to v1.5.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-09 15:30:21 +00:00
a0b076172f Fix Immich/Homepage Ingress host matching, add missing service checks (#127)
## Summary

- Fix Immich Ingress `host: photos` causing 404 with ProxyGroup (same FQDN mismatch as Prometheus/Loki)
- Migrate Homepage from old per-service Tailscale proxy to shared ProxyGroup (was the last holdout)
- Add Immich and Navidrome to `services-check` HTTP endpoints

## Deployment Notes

- Already tested on branch: Immich and Homepage both return 200 via Caddy
- Homepage's old Helm-managed Ingress was deleted manually; ArgoCD may recreate it on sync — prune with `argocd app sync homepage --prune` after merge
- Old per-service `ts-homepage-*` pod in tailscale namespace can be cleaned up after confirming ProxyGroup works

## Test Plan

- [x] `curl https://photos.ops.eblu.me/` returns 200
- [x] `curl https://go.ops.eblu.me/` returns 200
- [ ] `mise run services-check` fully passes after merge

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/127
2026-02-08 22:12:50 -08:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Forgejo Actions
c8d0af6644 Update docs release to v1.5.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 18:06:46 +00:00
cc54b4f565 Add Fly.io proxy observability via embedded Alloy (#123)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m16s
## Summary

- Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus)
- Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting)
- Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services)

## Changed Files

| File | Change |
|------|--------|
| `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint |
| `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config |
| `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write |
| `fly/start.sh` | Start Alloy after Tailscale, before nginx |
| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard |
| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard |
| `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps |
| `docs/reference/services/flyio-proxy.md` | Document observability setup |

## Deployment and Testing

- [ ] `mise run fly-deploy` — rebuild container with Alloy
- [ ] `curl https://docs.eblu.me/` — generate traffic
- [ ] `fly logs -a blumeops-proxy` — verify Alloy startup
- [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}`
- [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}`
- [ ] `argocd app sync grafana-config` — deploy dashboards
- [ ] Verify dashboards show data in Grafana
- [ ] `mise run services-check` — no regressions

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
Forgejo Actions
c46d55060d Update docs release to v1.5.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 10:37:30 +00:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
Forgejo Actions
11c76d4768 Update docs release to v1.4.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:45:40 +00:00
Forgejo Actions
ab7efd8c1c Update docs release to v1.4.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:27:23 +00:00
Forgejo Actions
3f5017f732 Update docs release to v1.4.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:03:34 +00:00
3b4ff91469 Fix homepage Admin bookmark icons (#110)
## Summary
- Fix broken Pulumi icon: changed `pulumi` to `si-pulumi` (Simple Icons prefix required)
- Fix broken ArgoCD icon: changed `argocd` to `argo-cd` (Dashboard Icons uses hyphenated name)

## Deployment and Testing
- [ ] Sync homepage app in ArgoCD
- [ ] Verify icons appear on go.ops.eblu.me Admin bookmarks section

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/110
2026-02-05 06:29:39 -08:00