blumeops

Author	SHA1	Message	Date
Erich Blume	0cdc143227	Deploy Dex OIDC identity provider with Grafana SSO (#222 ) ## Summary - Deploys Dex OIDC identity provider on ringtail k3s cluster as central authentication service - Integrates Grafana as first SSO client via `auth.generic_oauth` - Uses Kubernetes CRD storage backend (no PVC needed) - All secrets (bcrypt hash, client secrets) injected via ExternalSecrets from 1Password item "Dex (blumeops)" - NixOS-built container image via `containers/dex/default.nix` ## Pre-requisites (manual, before deployment) 1. Create 1Password item "Dex (blumeops)" in `blumeops` vault with fields: - `password`: strong generated password for Dex login - `static-password-hash`: bcrypt hash of above (`htpasswd -BnC 10 eblume`, copy hash after `eblume:`) - `grafana-client-secret`: random 32-char hex (`openssl rand -hex 16`) 2. Build container: `mise run container-tag-and-release dex v1.0.0` ## Deployment sequence 1. Build container: `mise run container-tag-and-release dex v1.0.0` 2. Deploy Caddy: `mise run provision-indri -- --tags caddy` 3. Sync ArgoCD: `argocd app sync apps` → `argocd app sync dex` 4. Verify Dex: `curl https://dex.ops.eblu.me/.well-known/openid-configuration` 5. Sync Grafana: `argocd app sync grafana-config` → `argocd app sync grafana` 6. Test SSO: Visit `https://grafana.ops.eblu.me/login`, click "Sign in with Dex" ## Verification - [ ] Container image exists: `mise run container-list` shows `dex:v1.0.0-nix` - [ ] `curl https://dex.ops.eblu.me/.well-known/openid-configuration` returns valid OIDC discovery - [ ] `curl https://dex.ops.eblu.me/healthz` returns healthy - [ ] Grafana login shows "Sign in with Dex" button alongside local login - [ ] OIDC flow: click Dex → enter credentials → redirect back → logged in as Admin - [ ] Break-glass: local admin login still works - [ ] `mise run services-check` passes ## Files changed \| File \| Action \| Purpose \| \|------\|--------\|---------\| \| `containers/dex/default.nix` \| Create \| NixOS container build \| \| `argocd/apps/dex.yaml` \| Create \| ArgoCD app targeting ringtail \| \| `argocd/manifests/dex/*` (8 files) \| Create \| K8s manifests (RBAC, ExternalSecret, Deployment, Service, Ingress) \| \| `argocd/manifests/grafana-config/external-secret-dex-oauth.yaml` \| Create \| Grafana OIDC client secret \| \| `argocd/manifests/grafana-config/kustomization.yaml` \| Modify \| Add new ExternalSecret resource \| \| `argocd/manifests/grafana/values.yaml` \| Modify \| Add `auth.generic_oauth` config + envFromSecrets \| \| `ansible/roles/caddy/defaults/main.yml` \| Modify \| Add `dex.ops.eblu.me` reverse proxy entry \| \| `docs/changelog.d/feature-dex-oidc.feature.md` \| Create \| Changelog fragment \| Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/222	2026-02-19 20:24:24 -08:00
Erich Blume	b876e39981	Replace Homepage Helm chart with kustomize manifests and custom Dockerfile (#221 ) ## Summary - Replace third-party Helm chart (jameswynn/homepage v2.1.0, pinned at app v1.2.0) with plain kustomize manifests and a custom Dockerfile building from forge mirror at v1.10.1 - Adds Dockerfile (`containers/homepage/`) with multi-stage build (node:22-slim builder, node:22-alpine runtime) - Creates kustomize manifests: Deployment, Service, ConfigMap (6 config files), ServiceAccount, ClusterRole, ClusterRoleBinding - Keeps existing ingress-tailscale.yaml and all 6 ExternalSecret resources unchanged - Updates ArgoCD app definition from multi-source Helm to single directory source ## Prerequisite - Homepage source mirrored at forge.ops.eblu.me/eblume/homepage.git ✅ - Container must be built and pushed before syncing: `mise run container-release homepage v1.10.1` ## Deployment and Testing - [ ] Build and push container image: `mise run container-release homepage v1.10.1` - [ ] Branch-test via ArgoCD: `argocd app set homepage --revision feature/homepage-kustomize && argocd app sync homepage` - [ ] Verify dashboard loads at go.ops.eblu.me / go.tail8d86e.ts.net - [ ] Verify k8s autodiscovery works (services appear on dashboard) - [ ] Verify widgets load (weather, Forgejo, Jellyfin, etc.) - [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/221	2026-02-19 18:29:19 -08:00
Erich Blume	d5d32fe91f	Port Frigate NVR to ringtail k3s with GPU acceleration (#217 ) ## Summary - Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime - Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources - Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX) - Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory ## Manual Prerequisites 1. NFS access: Verify ringtail can mount `sifaka:/volume1/frigate` ```fish ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1' ``` 2. YOLO model: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka ## Deployment Steps 1. Provision ringtail: `mise run provision-ringtail` 2. Sync ArgoCD apps: `argocd app sync apps --prune` 3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin` 4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json \| jq '.items[].status.capacity'` 5. Deploy Frigate: `argocd app sync frigate` ## Verification - [ ] `nvidia.com/gpu: 1` visible in node capacity - [ ] Frigate pod running with GPU allocated - [ ] Frigate UI loads at `https://nvr.ops.eblu.me` - [ ] Detector shows ONNX/TensorRT on System page - [ ] Camera feed with bounding boxes in live view - [ ] TensorRT engine build completes (watch logs on first start) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217	2026-02-19 14:27:04 -08:00
Erich Blume	16a4a9a616	Port Mosquitto and ntfy to ringtail k3s, retire Apple Silicon Detector (#216 ) ## Summary - Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired - Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s - Move ntfy ArgoCD app from indri minikube to ringtail k3s - Update Frigate docs to reflect detector removal and planned RTX 4080 migration - Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail ## Deployment After merge: 1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps: ``` argocd app sync apps --prune ``` 2. Sync new ringtail apps: ``` argocd app sync mosquitto-ringtail argocd app sync ntfy-ringtail ``` 3. Manually clean up the detector LaunchAgent on indri: ``` ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist' ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist' ``` ## Notes - Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns) - ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup - Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed - Frigate + frigate-notify will be ported to ringtail in a follow-up PR 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216	2026-02-19 11:22:44 -08:00
Erich Blume	61ca1ca305	Deploy Tailscale operator on ringtail k3s cluster (#215 ) ## Summary - Extract shared Tailscale operator resources (CRDs, RBAC, Deployment, ProxyClass, DNSConfig) into `tailscale-operator-base/` so both clusters reference the same manifests - Add `tailscale-operator-ringtail/` overlay with 1-replica ProxyGroup and ExternalSecret for the shared OAuth client - Add ArgoCD Application targeting `ringtail.tail8d86e.ts.net:6443` - Update `.yamllint.yaml` ignore path for the moved `operator.yaml` ## Deployment and Testing - [ ] Sync `apps` app to pick up the new Application definition - [ ] `argocd app sync tailscale-operator-ringtail` - [ ] Verify ExternalSecret syncs: `kubectl --context=k3s-ringtail -n tailscale get externalsecret` - [ ] Verify operator pod runs: `kubectl --context=k3s-ringtail -n tailscale get pods` - [ ] Verify ProxyGroup ready: `kubectl --context=k3s-ringtail -n tailscale get proxygroups` - [ ] Verify indri operator still works: `argocd app diff tailscale-operator` - [ ] Check Tailscale admin for new operator device with `tag:k8s-operator` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/215	2026-02-19 09:33:05 -08:00
Erich Blume	918df9e642	Add k3s, 1Password Connect, and systemd nix-container-builder to ringtail (#209 ) ## Summary Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions runner for building containers with Nix. ### K3s cluster - Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint) - TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale - Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`) - Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access - `kubectl` added to system packages ### 1Password Connect + External Secrets Operator - Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`, `external-secrets-config-ringtail` - Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server - Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password` namespace in post_tasks ### Systemd Forgejo Actions runner - Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host - Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH - Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env` - Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access ### Nix container build workflow - New `.forgejo/workflows/build-container-nix.yaml` triggers on `-nix-v[0-9]` tags (e.g. `nettest-nix-v1.0.0`) - Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy` - Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering ### Mise task updates - `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`) - `container-list` shows build type indicator (`[nix]` / `[dockerfile]`) ## Post-merge 1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild 2. Register k3s cluster in ArgoCD (first time only): ```fish ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' \| \ sed 's\|127.0.0.1\|ringtail.tail8d86e.ts.net\|' > /tmp/k3s-ringtail.yaml set -x KUBECONFIG /tmp/k3s-ringtail.yaml argocd cluster add default --name k3s-ringtail 3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail -> external-secrets-config-ringtail 4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder' 5. Check Forgejo admin panel for ringtail-nix-builder runner online 6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0 Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209	2026-02-18 21:15:30 -08:00
Erich Blume	105a2c8c08	Update External Secrets Helm chart 1.3.1 → 2.0.0 (#203 ) ## Summary - Bump External Secrets Operator Helm chart from `helm-chart-1.3.1` to `helm-chart-2.0.0` (operator v1.3.2) - Updates both the operator app and CRDs app `targetRevision` - No Helm values changes needed — `installCRDs`, `resources`, `webhook`, `certController` keys are unchanged ## Breaking changes in chart 2.0.0 - Removed providers: Alibaba and Device42 (unmaintained) — does not affect our 1Password setup - Templating engine v1 deprecated — our ExternalSecrets don't set `engineVersion`, so they use the default (v2) - Webhook `failurePolicy` for SecretStore is now dynamic ## Deployment 1. Sync CRDs first: `argocd app set external-secrets-crds --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets-crds` 2. Sync operator: `argocd app set external-secrets --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets` 3. Verify: `kubectl --context=minikube-indri -n external-secrets get pods` 4. After merge, set both apps back to `--revision main` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/203	2026-02-17 10:43:21 -08:00
Erich Blume	22f418d0dc	Doc review: connect-to-postgres, create-release-artifact-workflow, deploy-k8s-service (#191 ) ## Summary Review session covering 3 docs, plus a codebase-wide cleanup: ### Docs reviewed - connect-to-postgres — verified end-to-end (psql connection tested), stamped - create-release-artifact-workflow — clarified that `build-blumeops.yaml` is only a version bump example (not a packages API example) - deploy-k8s-service — fixed stale repoURL (`indri:2200` → `forge.ops.eblu.me:2222`), wrong Caddy config keys (`upstream` → `backend`, added missing `host`), updated Homepage group to "Services", added Tailscale tag documentation ### Codebase cleanup - Migrated all remaining `op item get --fields` calls to `op read` URI syntax across 7 files (docs, READMEs, YAML comments) - Simplified the `op read` vs `op item get` guidance in CLAUDE.md ## Side findings (not addressed) - New `immich-pg` CNPG cluster not yet documented in the postgresql reference card ## Test plan - [x] `psql` connection to `pg.ops.eblu.me` verified - [x] All pre-commit hooks pass - [x] `docs-check-links`, `docs-check-index`, `docs-check-frontmatter` pass Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/191	2026-02-15 07:42:01 -08:00
Erich Blume	04c7f3c45a	Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190 ) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - Mosquitto — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - Ntfy — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - Frigate — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - frigate-notify — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190	2026-02-14 21:27:44 -08:00
Erich Blume	b3747f6c95	Tier 1 version bumps (#186 ) All checks were successful Build Container / build (push) Successful in 8s Details ## Summary Audit and upgrade of all deployed images, helm charts, and custom container Dockerfiles to latest stable versions. This PR covers Tier 1 (low-risk minor/patch bumps only). ### Upstream images \| Image \| Old \| New \| \|-------\|-----\|-----\| \| kube-state-metrics \| v2.13.0 \| v2.18.0 \| \| prometheus \| v3.2.1 \| v3.9.1 \| \| loki \| 3.3.2 \| 3.6.5 \| \| alloy \| v1.5.1 \| v1.13.1 \| \| tailscale (proxy + operator) \| v1.92.5 \| v1.94.1 \| \| navidrome \| :latest \| v0.60.3 (pinned) \| ### Helm charts \| Chart \| Old \| New \| \|-------\|-----\|-----\| \| CloudNativePG \| v0.27.0 \| v0.27.1 \| \| 1Password Connect \| 2.2.1 \| 2.3.0 \| ### Custom containers (Dockerfiles updated, images not yet tagged) \| Container \| Changes \| New tag \| \|-----------\|---------\|---------\| \| miniflux \| 2.2.16→2.2.17 (security), alpine 3.22 \| v1.1.0 \| \| kubectl \| v1.34.1→v1.34.4, alpine 3.22 \| v1.1.0 \| \| kiwix-serve \| alpine 3.22 \| v1.1.0 \| \| nettest \| alpine 3.22 \| v0.14.0 \| \| transmission \| alpine 3.22, pkg 4.0.6-r4 \| v1.1.0 \| All custom containers verified with local `dagger call build`. ### Deferred to Tier 2 (separate PRs) - Forgejo runner 6→12 (major version scheme change) - Docker DinD 27→29 - Grafana chart 8→11 (repo migration) - External Secrets 1→2 (breaking changes) - Python 3.12→3.13, Elixir 1.18→1.19, Node 22→24 - Transmission 4.0.6→4.1.0 (not in Alpine yet) ## Deployment After merge: 1. Tag custom containers: `mise run container-tag-and-release <name> <version>` for each 2. Wait for CI builds to complete 3. `argocd app sync apps` then sync individual apps, or let ArgoCD auto-detect Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/186	2026-02-13 17:16:37 -08:00
Erich Blume	01e19023ee	Add CV/resume web app at cv.ops.eblu.me (#169 ) ## Summary - nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz) - ArgoCD app + k8s manifests (deployment, service, Tailscale ingress) - Caddy route for `cv.ops.eblu.me` - Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD - Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv) ## Deployment steps (after merge) 1. `mise run container-tag-and-release cv v1.0.0` 2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`) 3. Run "Deploy CV" workflow in blumeops (default: latest) 4. `mise run provision-indri -- --tags caddy` 5. Verify at `https://cv.ops.eblu.me/` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169	2026-02-12 11:09:41 -08:00
Erich Blume	1c86134a62	Phase 1b: Deploy docs hosting with Quartz (#85 ) ## Summary - Add ArgoCD Application and manifests for `quartz` service - Add `docs.ops.eblu.me` to Caddy reverse proxy configuration - ConfigMap points to blumeops v1.0.0 release tarball - Tailscale ingress with homepage annotations for auto-discovery ## Deployment and Testing Pre-deployment (container build): - [ ] Build and tag quartz container: `mise run container-tag-and-release quartz v1.0.0` K8s deployment: - [ ] Sync apps: `argocd app sync apps` - [ ] Point quartz at feature branch: `argocd app set quartz --revision feature/docs-phase-1b-hosting` - [ ] Sync quartz: `argocd app sync quartz` - [ ] Verify pod is running: `kubectl --context=minikube-indri get pods -n quartz` - [ ] Verify Tailscale ingress: `kubectl --context=minikube-indri get ingress -n quartz` Caddy deployment: - [ ] Dry run: `mise run provision-indri -- --tags caddy --check --diff` - [ ] Apply: `mise run provision-indri -- --tags caddy` Verification: - [ ] Test https://docs.tail8d86e.ts.net - [ ] Test https://docs.ops.eblu.me - [ ] Verify homepage dashboard shows docs link Post-merge: - [ ] Reset to main: `argocd app set quartz --revision main && argocd app sync quartz` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/85	2026-02-03 10:52:20 -08:00
Erich Blume	ade21cc49e	Add Navidrome music streaming server (#79 ) ## Summary - Deploy Navidrome music streaming server to k8s - NFS mount for music library from sifaka:/volume1/music (read-only) - Local PVC for SQLite database and config (10Gi) - Tailscale ingress for dj.tail8d86e.ts.net - Caddy reverse proxy for dj.ops.eblu.me - Homepage annotations for dashboard discovery in Media group ## Deployment and Testing - [ ] Sync `apps` application to pick up new Application definition - [ ] Set navidrome app to feature branch and sync - [ ] Verify NFS mount with `kubectl exec` - [ ] Provision Caddy for dj.ops.eblu.me - [ ] Access https://dj.ops.eblu.me and create initial admin user - [ ] Verify Homepage shows DJ in Media group - [ ] Reset to main and resync after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/79	2026-01-31 20:19:31 -08:00
Erich Blume	e830a0f26b	Homepage dashboard improvements (#76 ) ## Summary - Fix ArgoCD icon (use `argo-cd.png` per Dashboard Icons naming) - Add Borgmatic backup metrics widget (time since last backup, archive size) - Add Sifaka NAS disk usage widget (used/total space) - Create `[[grafana]]` zk card with management notes ## What didn't work Attempted Grafana iframe embedding for a metrics panel but reverted: - Homepage iframe widget only supports height classes, not width - Some panels fail to load even with anonymous auth enabled - Documented in grafana zk card for future reference ## Deployment and Testing - [x] ArgoCD icon displays correctly - [x] Borgmatic metrics show time since backup and archive size - [x] NAS disk usage shows used/total bytes - [x] Grafana reverted to authenticated-only access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/76	2026-01-30 15:05:02 -08:00
Erich Blume	38538ad5f0	Replace hajimari with gethomepage (#75 ) ## Summary - Remove hajimari (unmaintained since Oct 2022, broken helm deps) - Add gethomepage (28k stars, actively maintained, monthly releases) - Migrate custom apps, bookmarks, and search config - Enable k8s RBAC for service autodiscovery - Configure Tailscale ingress at go.tail8d86e.ts.net ## Why the switch Hajimari hasn't released since October 2022. The helm chart has a broken dependency (bjw-s/common URL is 404), and unreleased code on main has bugs. gethomepage has similar k8s autodiscovery via ingress annotations and is very actively maintained. ## Deployment and Testing - [ ] Delete hajimari app from ArgoCD - [ ] Delete hajimari namespace - [ ] Sync apps to pick up new homepage app - [ ] Sync homepage app - [ ] Verify go.ops.eblu.me loads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75	2026-01-30 13:21:12 -08:00
Erich Blume	9fd70c3892	Switch Hajimari to custom fork - Use chart from forge.ops.eblu.me/eblume/hajimari fork - Use custom image from registry.ops.eblu.me/blumeops/hajimari - Enables future customizations (search auto-focus, weather widget) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 11:26:20 -08:00
Erich Blume	d1164c8aac	Add Hajimari service dashboard (#73 ) ## Summary - Add Hajimari as a service dashboard/start page at `go.ops.eblu.me` - Auto-discovers k8s services from ingress annotations - Custom apps for non-k8s services: Forgejo, Registry, Sifaka NAS - Add `nas.ops.eblu.me` Caddy proxy to Synology dashboard ## Services Configured Auto-discovered (k8s ingresses with hajimari.io annotations): - Grafana, ArgoCD, Prometheus, Loki (Observability) - Miniflux, Kiwix, Transmission, TeslaMate, Immich (Apps) - PyPI/devpi (Infrastructure) Custom apps (non-k8s): - Forgejo (forge.ops.eblu.me) - Registry (registry.ops.eblu.me) - Sifaka NAS (nas.ops.eblu.me) Bookmarks: - Tailscale Admin, 1Password, Pulumi ## Deployment and Testing - [ ] Sync `apps` application to pick up new Hajimari Application - [ ] Sync `hajimari` application - [ ] Run `mise run provision-indri -- --tags caddy` for go/nas proxy entries - [ ] Re-sync all k8s apps with hajimari annotations (or wait for natural drift) - [ ] Verify https://go.ops.eblu.me shows dashboard with all services 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/73	2026-01-29 15:51:42 -08:00
Erich Blume	0e2df9645d	Fix ArgoCD sync drift for apps and immich (#71 ) ## Summary - Fix immich app to track `main` branch instead of `feature/immich` for values - The tailscale-operator ignoreDifferences schema drift will be fixed by syncing the `apps` app ## Deployment and Testing - [ ] Sync `apps` to fix tailscale-operator schema drift - [ ] Sync `immich` to pick up correct image versions from main 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/71	2026-01-29 10:24:26 -08:00
Erich Blume	482414346e	Add External Secrets Operator with 1Password Connect (#66 ) (#66 ) ## Summary - Add 1Password Connect server for secrets automation API - Add External Secrets Operator (ESO) to sync secrets from 1Password to K8s - Add ClusterSecretStore connecting ESO to 1Password Connect - Convert devpi secret to ExternalSecret as proof of concept ## Architecture ``` 1Password Cloud → 1Password Connect (k8s) → ESO → Native K8s Secrets ``` ## Deployment and Testing - [ ] Mirror Helm charts to forge (connect-helm-charts, external-secrets) - DONE - [ ] Create 1Password Connect credentials (`op connect server create`) - [ ] Store credentials in 1Password item "1Password Connect" - [ ] Bootstrap secret: `op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl \| kubectl apply -f -` - [ ] Deploy 1password-connect: `argocd app sync 1password-connect` - [ ] Deploy external-secrets: `argocd app sync external-secrets` - [ ] Deploy external-secrets-config: `argocd app sync external-secrets-config` - [ ] Test devpi ExternalSecret: `argocd app sync devpi` - [ ] Verify secret synced: `kubectl get externalsecret -n devpi` ## Future Work After PoC validated, migrate remaining 12 secret templates to ExternalSecrets: - databases (3), tailscale-operator (1), grafana-config (2), teslamate (2) - forgejo-runner (1), argocd (1), immich (1), 1password-connect (1 - self-bootstrap) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/66	2026-01-28 19:30:10 -08:00
Erich Blume	8621996343	Add Immich photo management + migrate forge URLs (#62 ) ## Summary - Migrate all ArgoCD app repo URLs from `indri.tail8d86e.ts.net:2200` to `forge.ops.eblu.me:2222` - Add Immich self-hosted photo management service with: - Helm chart deployment via ArgoCD - PostgreSQL cluster with pgvecto.rs for AI vector search (immich-pg) - NFS storage on sifaka for photo library (2Ti) - Tailscale Ingress + Caddy proxy for `photos.ops.eblu.me` - Machine learning service for face/object recognition ## Deployment and Testing - [x] Update ArgoCD repo-creds-forge secret with new URL (one-time manual step) - [ ] Sync `apps` to pick up new applications - [ ] Sync all existing apps to verify new forge URL works - [ ] Sync `blumeops-pg` to deploy immich-pg cluster - [ ] Wait for immich-pg to be healthy - [ ] Create immich-db secret from auto-generated password - [ ] Sync `immich-storage` (PV, PVC, Ingress) - [ ] Sync `immich` (Helm chart) - [ ] Run `mise run provision-indri -- --tags caddy` to add photos.ops.eblu.me - [ ] Verify Immich UI is accessible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/62	2026-01-26 11:20:11 -08:00
Erich Blume	ea42362b6f	Migrate Forgejo runner to Kubernetes with DinD (#60 ) ## Summary - Deploy Forgejo runner to k8s with Docker-in-Docker sidecar - Add job execution image with Node.js and Docker CLI - Retire host-mode runner on indri - All CI jobs now run containerized in k8s ## Components Added - `containers/forgejo-runner/Dockerfile` - Job execution image - `argocd/apps/forgejo-runner.yaml` - ArgoCD Application - `argocd/manifests/forgejo-runner/` - Kubernetes manifests ## Components Removed - `ansible/roles/forgejo_runner/` - No longer needed ## Changes to Existing Files - `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env - `.github/actionlint.yaml` - Only `k8s` label now valid ## Deployment 1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl \| kubectl --context=minikube-indri apply -f -` 2. Sync ArgoCD: `argocd app sync forgejo-runner` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60	2026-01-25 19:56:17 -08:00
Erich Blume	8ca8798121	Switch to Buildah for container builds (#51 ) All checks were successful Test CI / test (push) Successful in 4s Details ## Summary - Replace Docker with Buildah for container image builds - No Docker socket required - buildah is daemonless - Cleaner security model (no privileged containers or socket mounting) - Remove Docker-related security context from deployment ## Changes - Update Dockerfile to install buildah/podman instead of docker-cli - Configure buildah storage with overlay driver and fuse-overlayfs - Update composite action to use `buildah bud` and `buildah push` - Add `imagePullPolicy: Always` to ensure fresh image pulls - Update test workflow to verify buildah/podman ## Testing - [ ] Runner pod starts successfully - [ ] Buildah is available in runner - [ ] Test workflow verifies buildah/podman versions - [ ] Container build workflow builds and pushes to zot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51	2026-01-24 13:30:26 -08:00
Erich Blume	7893c41020	Enable Forgejo Actions (Phase 1) (#48 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password - Enable Forgejo Actions in config (`[actions] ENABLED = true`) - Add `repo.actions` to DEFAULT_REPO_UNITS - Clean up unused MySQL database fields (we use SQLite) ## Phase 1 Progress This PR covers the first part of Phase 1 (ci-cd-bootstrap plan): - [x] Refactor app.ini to ansible template - [x] Store secrets in 1Password - [x] Enable Actions in config - [ ] Deploy config changes (pending review) - [ ] Create runner registration token - [ ] Deploy runner to k8s - [ ] Test with simple workflow ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags forgejo` to deploy - [ ] Verify Forgejo restarts correctly - [ ] Verify Actions tab appears in repo settings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48	2026-01-23 17:00:12 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	e4a8405de7	Observability cleanup and k8s service monitoring (#43 ) (#43 ) ## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43	2026-01-22 13:51:01 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	5a829e0afd	Remove unused indri tags and ansible roles (#41 ) ## Summary - Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission - Also remove unused node_exporter and podman ansible roles - Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed) - Update indri description to reflect current architecture ## Changes Ansible roles removed (34 files, ~1000 lines): - devpi, devpi_metrics - kiwix - transmission, transmission_metrics - node_exporter - podman Pulumi indri tags removed: - tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed These services now run in k8s with their own Tailscale devices via tailscale-operator. ## Deployment and Testing - [x] Verified remaining ansible roles match indri.yml - [x] Verified no playbooks or role dependencies reference removed roles - [ ] Run `pulumi preview` to verify tag changes - [ ] Run `pulumi up` to apply tag changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41	2026-01-21 20:18:53 -08:00
Erich Blume	7ec98210a9	P6: Migrate Kiwix and Transmission to Kubernetes (#39 ) ## Summary - Add Transmission BitTorrent daemon to k8s (torrent namespace) - Add Kiwix ZIM archive server to k8s (kiwix namespace) - NFS storage from sifaka for shared torrent/ZIM data - Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list - ZIM-watcher CronJob to auto-restart kiwix when new archives appear - Remove transmission, transmission_metrics, and kiwix ansible roles from indri - Remove svc:kiwix from tailscale_serve defaults ## Key Decisions - Direct NFS mount for kiwix (no PVC) since it shares storage with transmission - Shell wrapper for kiwix-serve command (glob expansion) - Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism) - Completed downloads stored in `/downloads/complete/` on sifaka ## Deployment and Testing - [x] Deployed transmission to k8s - [x] Verified transmission web UI at torrent.tail8d86e.ts.net - [x] Moved existing ZIM files to complete folder - [x] Deployed kiwix to k8s - [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net - [x] Stopped old services on indri - [x] Cleared svc:kiwix from Tailscale serve on indri - [x] Updated zk documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39	2026-01-21 18:07:40 -08:00
Erich Blume	0439fbb704	P5: Migrate devpi to Kubernetes (#34 ) ## Summary - Migrate devpi PyPI caching proxy from indri LaunchAgent to Kubernetes - Custom container image with devpi-server + devpi-web + auto-init - StatefulSet with 50Gi PVC, Tailscale Ingress at pypi.tail8d86e.ts.net - Remove devpi from ansible playbooks and update CLAUDE.md with k8s workflow ## Key Changes - Add CRI-O registry mirror config for registry.tail8d86e.ts.net - Change ArgoCD apps to manual sync (was auto-sync causing issues) - 2Gi memory limit for Whoosh indexer (reclaimed after startup) ## Deployment and Testing - [x] devpi pod healthy in k8s - [x] pip install through proxy works - [x] mcquack 1.0.0 uploaded and installable - [x] Old devpi stopped on indri ## Post-Merge Reset ArgoCD to main: ``` argocd app set apps --revision main && argocd app sync apps argocd app set devpi --revision main && argocd app sync devpi ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/34	2026-01-20 14:55:37 -08:00
Erich Blume	735b643429	P4: Miniflux migration + PostgreSQL consolidation (#33 ) ## Summary - Deploy miniflux in k8s via ArgoCD - Expose via Tailscale Ingress at feed.tail8d86e.ts.net - Retire brew PostgreSQL (no longer needed) - Rename k8s-pg to pg (canonical hostname) - Remove ansible miniflux and postgresql roles - Update borgmatic to backup pg.tail8d86e.ts.net - Update all zk documentation ## Deployment and Testing - [x] Miniflux pod running in k8s - [x] User login works at https://feed.tail8d86e.ts.net - [x] Feeds and entries visible - [x] brew miniflux and postgresql stopped - [x] Tailscale services migrated (feed, pg) - [x] zk documentation updated - [x] Run ansible to apply role removals - [ ] Verify borgmatic backup with new pg hostname 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33	2026-01-20 09:04:47 -08:00
Erich Blume	eb952aae01	P3: PostgreSQL disaster recovery test and borgmatic k8s-pg backup (#32 ) ## Summary - Fixed borgmatic `borg: command not found` by adding `local_path` config option - Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg - Added borgmatic user to k8s-pg via CloudNativePG managed roles - Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases - Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432 - Disabled selfHeal on apps app to allow manual revision changes during development ## Changes - `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry - `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass` - `argocd/apps/apps.yaml` - Disabled selfHeal - `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role - `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template - `pulumi/policy.hujson` - Added ACL grant for backup access ## Deployment and Testing - [x] Borgmatic backup runs successfully - [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified) - [x] borgmatic user created in k8s-pg with pg_read_all_data role - [x] Both localhost and k8s-pg databases in backup archive - [x] zk documentation updated (borgmatic.md, postgresql.md) - [ ] After merge: set blumeops-pg app back to main revision 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32	2026-01-19 18:00:32 -08:00
Erich Blume	7e6742ad24	K8s Migration Phase 2: Grafana to Kubernetes (#30 ) ## Summary - Migrate Grafana from Homebrew/Ansible to Kubernetes deployment - Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed) - Add Grafana Helm chart deployment via ArgoCD with multi-source pattern - Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps) - Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal` ## Key Changes - `argocd/apps/grafana.yaml` - Grafana Helm chart Application - `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps - `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo - `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0 ## Deployment and Testing - [x] Deploy Loki config change: `mise run provision-indri -- --tags loki` - [x] Create namespace: `ki create namespace monitoring` - [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl \| ki apply -f -` - [x] Sync ArgoCD apps (grafana, grafana-config) - [x] Verify Grafana works at https://grafana.tail8d86e.ts.net - [x] Remove svc:grafana from ansible tailscale_serve - [x] Stop brew grafana: `ssh indri 'brew services stop grafana'` - [x] Delete ansible grafana role 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30	2026-01-19 14:40:25 -08:00
Erich Blume	a8f4d00294	K8s Migration Phase 1: Infrastructure Setup (#29 ) ## Summary - Split k8s migration plan into phases folder for easier navigation - Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads - Phase 1 work in progress ## Phase 1 Goals - Tailscale Kubernetes Operator - CloudNativePG Operator - PostgreSQL cluster for future app migrations ## Deployment and Testing - [ ] Review Phase 1 plan - [ ] `mise run tailnet-preview` to verify ACL changes - [ ] `mise run tailnet-up` to apply ACL changes - [ ] Create Tailscale OAuth client (manual) - [ ] Deploy operators and PostgreSQL cluster 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29	2026-01-19 09:49:52 -08:00

33 commits