blumeops

Author	SHA1	Message	Date
Erich Blume	61ca1ca305	Deploy Tailscale operator on ringtail k3s cluster (#215 ) ## Summary - Extract shared Tailscale operator resources (CRDs, RBAC, Deployment, ProxyClass, DNSConfig) into `tailscale-operator-base/` so both clusters reference the same manifests - Add `tailscale-operator-ringtail/` overlay with 1-replica ProxyGroup and ExternalSecret for the shared OAuth client - Add ArgoCD Application targeting `ringtail.tail8d86e.ts.net:6443` - Update `.yamllint.yaml` ignore path for the moved `operator.yaml` ## Deployment and Testing - [ ] Sync `apps` app to pick up the new Application definition - [ ] `argocd app sync tailscale-operator-ringtail` - [ ] Verify ExternalSecret syncs: `kubectl --context=k3s-ringtail -n tailscale get externalsecret` - [ ] Verify operator pod runs: `kubectl --context=k3s-ringtail -n tailscale get pods` - [ ] Verify ProxyGroup ready: `kubectl --context=k3s-ringtail -n tailscale get proxygroups` - [ ] Verify indri operator still works: `argocd app diff tailscale-operator` - [ ] Check Tailscale admin for new operator device with `tag:k8s-operator` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/215	2026-02-19 09:33:05 -08:00
Erich Blume	630ebcd12d	Add ringtail DeviceTags and homelab-to-homelab SSH rule (#210 ) ## Summary - Add `ringtail` DeviceTags Pulumi resource with `tag:homelab` + `tag:blumeops` (matching indri/sifaka pattern) - Remove the bootstrap `ringtail_key` auth key — ringtail is already on the tailnet - Add SSH ACL rule allowing `tag:homelab` → `tag:homelab` SSH, unblocking cross-host management (e.g., ringtail running ansible against indri) ## Deployment and Testing - [ ] `mise run tailnet-preview` — dry run, confirm diff - [ ] `mise run tailnet-up` — apply - [ ] From ringtail: `ssh indri 'hostname'` — should succeed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/210	2026-02-18 21:48:11 -08:00
Erich Blume	b9d813cde1	Add NixOS configuration for ringtail workstation (#207 ) ## Summary - NixOS flake for ringtail (gaming/compute workstation, RTX 4080) in `nixos/ringtail/` - Declarative disk partitioning via disko (GPT, 512M EFI + ext4 root on NVMe) - NVIDIA proprietary drivers, sway/Wayland desktop, greetd, PipeWire, Steam - Tailscale integration for tailnet connectivity - Ansible playbook + `mise run provision-ringtail` for ongoing management - Pulumi auth key (`tag:homelab`, `tag:blumeops`) for tailnet bootstrap ## Deployment Order 1. Merge PR 2. `pulumi up` in tailscale stack → creates auth key 3. Retrieve auth key: `pulumi stack output ringtail_authkey --show-secrets` 4. On ringtail NixOS installer: - `nix run github:nix-community/disko -- --mode disko /tmp/disk-config.nix` (or from cloned repo) - `nixos-install --flake github:eblume/blumeops?dir=nixos/ringtail#ringtail` 5. Reboot, `tailscale up --auth-key=<key>` 6. Verify: `tailscale status`, SSH from gilbert ## Test plan - [ ] Review NixOS configuration for completeness - [ ] Verify disko partition layout matches ringtail hardware - [ ] Run `pulumi preview` for tailscale stack - [ ] Install NixOS on ringtail - [ ] Confirm tailscale connectivity - [ ] Confirm sway desktop works - [ ] Test `mise run provision-ringtail` for ongoing management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/207	2026-02-18 08:24:25 -08:00
Erich Blume	e6cf7e47e0	Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m8s Details ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126	2026-02-08 21:54:18 -08:00
Erich Blume	cc54b4f565	Add Fly.io proxy observability via embedded Alloy (#123 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m16s Details ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: Docs APM (per-service view filtered by `host="docs.eblu.me"`) and Fly.io Proxy Health (aggregate proxy health across all upstream services) ## Changed Files \| File \| Change \| \|------\|--------\| \| `fly/nginx.conf` \| Add JSON `log_format` + `access_log`, add `stub_status` endpoint \| \| `fly/Dockerfile` \| COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config \| \| `fly/alloy.river` \| New — Alloy config: log tailing, metric extraction, remote_write \| \| `fly/start.sh` \| Start Alloy after Tailscale, before nginx \| \| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` \| New — Docs APM dashboard \| \| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` \| New — Fly.io Proxy Health dashboard \| \| `argocd/manifests/grafana-config/kustomization.yaml` \| Register new dashboard configmaps \| \| `docs/reference/services/flyio-proxy.md` \| Document observability setup \| ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123	2026-02-08 10:05:38 -08:00
Erich Blume	64a78422b1	Add Fly.io public reverse proxy for docs.eblu.me (#120 ) Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 9s Details ## Summary - Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale - First service exposed: `docs.eblu.me` — the Quartz static docs site - Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME - Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow ## Key details - Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed - Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts - nginx caches aggressively for the static site; health check is on the default_server block - ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only - DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev` ## Test plan - [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok` - [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status` - [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert - [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected) - [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120	2026-02-08 02:36:19 -08:00
Erich Blume	b08faa50cc	Add Gandi DNS management via Pulumi (#54 ) ## Summary - Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/` - Add Gandi LiveDNS management for `eblu.me` domain - Create wildcard DNS record `.ops.eblu.me` → indri's Tailscale IP (100.98.163.89) - Add mise tasks: `dns-up`, `dns-preview` - Update `tailnet-up` to pass `--yes` by default - Document PAT cycling process (expires every 30 days) ## Background This enables using real DNS names (`.ops.eblu.me`) that resolve to Tailscale IPs, which allows containers and other systems to resolve services without depending on MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain tailnet-only while using standard DNS. ## Deployment and Testing - [ ] Run `cd pulumi/gandi && uv sync` to install dependencies - [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack - [ ] Run `mise run dns-preview` to verify configuration - [ ] Run `mise run dns-up` to apply DNS records - [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54	2026-01-25 08:15:46 -08:00

7 commits