blumeops

Author	SHA1	Message	Date
Erich Blume	77a1ea15d2	Remove mikado frontmatter from closed chains, clarify finalization rules During finalization, all mikado frontmatter (requires, status, branch) should be removed — cards become plain documentation linked via wiki-links. Updated agent-change-process docs and cleaned up 10 cards from closed chains. Also fixed ai-docs referencing deleted plans/ files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:43:19 -08:00
Erich Blume	55a846eb25	Retire plans directory, convert migrate-forgejo-from-brew to mikado card The plans/ directory predated the mikado method approach. Deleted all completed and abandoned plans, converted the still-relevant migrate-forgejo-from-brew into a lean mikado chain root card under how-to/forgejo/, cleaned up dangling wiki-links across docs, and fixed a stale "pre-commit" reference to "prek". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:28:14 -08:00
Erich Blume	6ca3c67705	Add Ollama reference card and update indexes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 19:43:14 -08:00
Erich Blume	5ddb47de1c	Review upgrade-grafana doc: fix image tag ref, add sidecar link Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 07:53:22 -08:00
Erich Blume	b460333da0	Upgrade Transmission to 4.1.1 (#282 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container (Nix) / detect (push) Successful in 2s Details Build Container (Nix) / build (transmission) (push) Successful in 2s Details Build Container / build (transmission) (push) Successful in 6s Details ## Summary - Upgrade Transmission from 4.0.6-r4 to 4.1.1-r1 - Uses Alpine edge community repo for transmission packages, keeping stable alpine:3.22 base - Fix stale image reference in service doc (was linuxserver, now custom registry image) - Mark transmission as reviewed in service-versions.yaml ## Context Service review found Transmission two minor versions behind (4.0.6 → 4.1.1). Alpine 3.22 only packages 4.0.6, so transmission is installed from edge's community repo with an exact version pin. 4.1.0 added improved µTP performance, IPv6/dual-stack UDP tracker, JSON-RPC 2.0 API. 4.1.1 is a bugfix release (20+ fixes). Dagger test build passed locally. ## Deployment and Testing - [ ] Build container via Forgejo workflow (`mise run container-build-and-release transmission`) - [ ] Update kustomization.yaml with new image tag - [ ] `argocd app set torrent --revision feature/transmission-review && argocd app sync torrent` - [ ] Verify web UI at https://torrent.ops.eblu.me - [ ] Check Grafana Transmission dashboard still receives metrics - [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent` ## Note The transmission-exporter sidecar (OOMKilling every ~30min, 294 restarts) is being tracked separately as a future replacement project. Reviewed-on: #282	2026-03-04 07:44:33 -08:00
Erich Blume	b3d5478020	Use towncrier orphan fragment naming for C0 changes C0 changes have no branch name, so `main.<type>.md` fragments collide. Switch to towncrier's `+<slug>.<type>.md` orphan convention and rename existing `main.*` fragments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 15:30:00 -08:00
Erich Blume	d7f0aa6f96	Fix Frigate database path to use persistent volume The database was at /config/frigate.db (emptyDir, ephemeral) instead of /db/frigate.db (PVC, persistent). Every pod restart wiped the database, losing all recording history and leaving orphaned files on NFS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 15:18:16 -08:00
Erich Blume	a4f5f7ce09	Add changelog fragment for Frigate memory limit bump Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:58:35 -08:00
Erich Blume	a2bb9abbdb	Home-build grafana-sidecar container (#281 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s Details Build Container / build (grafana-sidecar) (push) Successful in 6s Details ## Summary - Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0` - Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop - Adds Dockerfile, service-versions entry, docs, and changelog fragment - Manifest switch to home-built image pending container build ## Deployment and Testing - [ ] `mise run container-build-and-release grafana-sidecar` - [ ] Update kustomization.yaml with built image tag - [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana` - [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me - [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana` Reviewed-on: #281	2026-03-03 13:48:24 -08:00
Erich Blume	81a8ca24b9	Clarify that changelog fragments apply to all change levels (C0–C2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:15:06 -08:00
Erich Blume	876e51dd77	Allow implicit octals in yamllint and normalize k8s mode values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:10:44 -08:00
Erich Blume	4518fa3ac3	Add changelog fragment for Gandi bookmark Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:06:02 -08:00
Erich Blume	3dc4ed730b	Build Loki container image locally (#280 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (loki) (push) Successful in 2s Details Build Container / build (loki) (push) Successful in 7s Details ## Summary - Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/` - Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki` - Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build ## Details - UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup) - CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build` - Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session) - Pattern follows miniflux (two-stage Go) + prometheus (ldflags) ## Deployment and Testing - [ ] Trigger container build: `mise run container-build-and-release loki` - [ ] Update kustomize tag to actual build tag - [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki` - [ ] Verify `/ready` endpoint and log ingestion - [ ] After merge: update to `[main]` tag (C0 follow-up) Reviewed-on: #280	2026-03-03 13:00:43 -08:00
Erich Blume	eb9bc57351	Upgrade TeslaMate v2.2.0 → v3.0.0 (#279 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container (Nix) / build (teslamate) (push) Successful in 2s Details Build Container / detect (push) Successful in 26s Details Build Container / build (teslamate) (push) Successful in 4m22s Details ## Summary - Upgrade TeslaMate from v2.2.0 to v3.0.0 (first service review) - Elixir 1.18 → 1.19.5, runtime base bookworm → trixie - Adds zstd/brotli build deps for new static asset compression - DB migration (BTREE → BRIN indexes) runs automatically via entrypoint ## Deployment and Testing - [ ] Trigger container build: `mise run container-build-and-release teslamate` - [ ] Update kustomization.yaml with new image tag - [ ] Deploy from branch: `argocd app set teslamate --revision upgrade/teslamate-v3.0.0 && argocd app sync teslamate` - [ ] Verify TeslaMate UI loads and data is intact - [ ] Check logs for migration errors - [ ] After merge: reset ArgoCD to main, update kustomization tag to `[main]` image Reviewed-on: #279	2026-03-03 11:56:40 -08:00
Erich Blume	823a35bb9a	Add changelog fragment for ringtail firewall fix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 11:17:48 -08:00
Erich Blume	6c5a99883f	Add pre-commit check for changelog fragment placement Misfiled fragment from feature/ branch created a subdirectory under changelog.d/ which towncrier doesn't support. Move the fragment to the correct flat location and add a changelog-check mise task + prek hook to prevent this from happening again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:49:01 -08:00
Erich Blume	7b68be2e80	Add fly.io proxy observability and app logs to Forgejo dashboard Rename "Forgejo Repository Health" to "Forgejo" and add proxy metrics (request rate, error rate, RPS, latency, bandwidth), proxy access logs, and Forgejo application logs from Loki. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:24:53 -08:00
Erich Blume	cf8736c73b	Review kustomize-grafana-deployment: fix manifest table to match reality The doc listed a nonexistent configmap.yaml instead of the actual raw config files (grafana.ini, datasources.yaml, provider.yaml) consumed by kustomization.yaml's configMapGenerator. Added last-reviewed date. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:14:41 -08:00
Erich Blume	a87c997ee1	Expose Forgejo publicly at forge.eblu.me (#278 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m28s Details ## Summary Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service. - Forgejo hardening: Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO) - Tailscale Ingress: ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint - Fly.io proxy: nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit - Authentik: OAuth callback updated to forge.eblu.me - DNS/TLS: CNAME record in Pulumi, cert in fly-setup - Rename: ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is) ## Deployment Order 1. `mise run provision-indri -- --tags forgejo` (config changes) 2. Verify forge.ops.eblu.me still works 3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator` 4. Verify `curl https://forge.tail8d86e.ts.net` 5. `cd fly && fly deploy` 6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/` 7. `fly certs add forge.eblu.me -a blumeops-proxy` 8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik` 9. `mise run dns-preview && mise run dns-up` 10. Full verification (see below) 11. Rehearse `mise run fly-shutoff` 12. After merge: reset ArgoCD revisions to main, re-sync ## Verification Checklist - [ ] forge.eblu.me loads, shows public repos - [ ] forge.ops.eblu.me still works from tailnet - [ ] SSH clone via forge.ops.eblu.me:2222 works - [ ] HTTPS clone via forge.eblu.me works - [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH - [ ] /swagger returns 403 - [ ] Rapid login attempts trigger 429 rate limit - [ ] fail2ban bans after 5 failed logins in 10 minutes - [ ] ArgoCD can still sync (SSH unaffected) - [ ] `mise run fly-shutoff` stops all public traffic - [ ] `mise run services-check` passes Reviewed-on: #278	2026-03-03 08:40:41 -08:00
Erich Blume	31d925814f	Deploy Ollama LLM server on ringtail (#277 ) ## Summary - Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration - Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern) - Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b` - hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi) - Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet - Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080 ## Deployment and Testing - [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin` - [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2` - [ ] Sync `apps` app with `--revision feature/ollama-ringtail` - [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama` - [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail` - [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags` - [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'` - [ ] Verify Frigate still works after GPU sharing change - [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277	2026-03-02 20:39:51 -08:00
Forgejo Actions	0f79c61c42	Update docs release to v1.12.1 - Built changelog from towncrier fragments [skip ci]	2026-03-02 18:17:07 -08:00
Erich Blume	7a1875936c	Switch git hooks from pre-commit to prek (#276 ) ## Summary - Replace pre-commit with [prek](https://github.com/j178/prek), a faster Rust-native drop-in alternative - Migrate config from `.pre-commit-config.yaml` (YAML) to `prek.toml` (TOML) - Add new built-in checks: case conflicts, private key detection, executable shebangs - Install prek via mise native registry (`aqua:j178/prek`) instead of pipx - Update all doc references across README, contributing guide, and how-to docs ## Notes - `check-yaml` still uses the remote `pre-commit-hooks` repo because prek's builtin fast path doesn't support `--unsafe` yet (needed for Ansible custom YAML tags) - All existing custom hooks (docs validation, container version check, mikado invariant, workflow validation) work unchanged - Tested: all hooks pass on clean tree, deliberate doc link breakage is caught ## Test plan - [x] `prek run --all-files` passes all checks - [x] Broken wiki-link correctly caught by `docs-check-links` - [x] taplo-format auto-fixes TOML formatting on commit - [x] commit-msg hook (mikado invariant) fires correctly Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/276	2026-03-02 18:15:23 -08:00
Erich Blume	2d54f93c68	Add changelog for impl-card-guard feature Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 17:45:01 -08:00
Erich Blume	9465b75815	Add changelog for authentik source chain doc review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 07:28:09 -08:00
Erich Blume	08b9570ac7	Review build-authentik-from-source Mikado chain docs Fix go-server-derivation: wrong path target (webui not authentik-django) and missing internal/web/static.go patch. Remove stale DRF fork content from mirror-build-deps (no longer needed as of 2026.2.0). Add last-reviewed to all 5 cards without it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 07:28:09 -08:00
Forgejo Actions	847e47eaf3	Update docs release to v1.12.0 - Built changelog from towncrier fragments [skip ci]	2026-03-01 17:24:09 -08:00
Erich Blume	2a2811d7a5	Review authentik-api-client-generation doc: fix stale content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 17:21:46 -08:00
Erich Blume	c9d273dc81	Update authentik changelog fragment to mention version upgrade Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 17:11:41 -08:00
Erich Blume	2d4098e480	Fix authentik 2026.2.0 migration ordering bug (#275 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container (Nix) / detect (push) Successful in 1s Details Build Container / build (authentik) (push) Successful in 1s Details Build Container (Nix) / build (authentik) (push) Successful in 3m6s Details ## Summary - Patch `authentik_rbac/0010` migration to depend on `authentik_core/0056`, fixing non-deterministic ordering that crashes startup with `FieldError: Cannot resolve keyword 'group_id'` - Upstream bug: goauthentik/authentik#19616, #20634 — no fix released yet - Document the issue in the lessons-learned table ## Deployment and Testing - [ ] CI builds container image - [ ] Deploy from branch: `argocd app set authentik --revision fix/authentik-migration-ordering && argocd app sync authentik` - [ ] Pods reach Running/Ready without crash-looping - [ ] `kubectl logs` show 0056 migrating before 0010 - [ ] authentik UI loads at authentik.ops.eblu.me - [ ] `mise run services-check` - [ ] After merge: `argocd app set authentik --revision main && argocd app sync authentik` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/275	2026-03-01 16:28:36 -08:00
Erich Blume	efa9806bfa	C2: Build authentik from source (Mikado chain) (#274 ) All checks were successful Build Container / detect (push) Successful in 3s Details Build Container (Nix) / detect (push) Successful in 1s Details Build Container / build (authentik) (push) Successful in 2s Details Build Container (Nix) / build (authentik) (push) Successful in 22s Details ## Mikado Chain: build-authentik-from-source Replace `pkgs.authentik` from nixpkgs with a custom Nix derivation built from source. This removes the dependency on the nixpkgs packaging timeline and gives full version control. Target version: 2025.12.4 (nixpkgs reference, upgrading from deployed 2025.10.1). ### Dependency Graph ``` build-authentik-from-source (goal) ├── authentik-go-server-derivation │ ├── authentik-api-client-generation ← IN PROGRESS │ └── authentik-python-backend-derivation ├── authentik-web-ui-derivation │ └── authentik-api-client-generation ← IN PROGRESS └── authentik-python-backend-derivation ``` ### Ready Leaves - `authentik-api-client-generation` — Go + TypeScript client generation from OpenAPI schema - `authentik-python-backend-derivation` — Django backend with 60+ deps, 4 in-tree packages ### Architecture Ported from [nixpkgs `pkgs/by-name/au/authentik/package.nix`](https://github.com/NixOS/nixpkgs/tree/master/pkgs/by-name/au/authentik): - `source.nix` — shared version/source fetch - `client-go.nix` — Go API client generation - `client-ts.nix` — TypeScript API client generation - `api-go-vendor-hook.nix` — Go vendor directory injection hook - (more components to follow as leaves are closed) ### Related Cards - [[build-authentik-from-source]] — Goal card - [[authentik-api-client-generation]] - [[authentik-python-backend-derivation]] - [[authentik-web-ui-derivation]] - [[authentik-go-server-derivation]] Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/274	2026-03-01 13:45:00 -08:00
Erich Blume	0aaf9bb8b2	Add Dagger local build step to authentik source build goal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:39:25 -08:00
Erich Blume	7094ea7d3e	Start C2 Mikado chain: build authentik from source Create goal card and 4 prerequisite cards for building authentik from a custom Nix derivation instead of using pkgs.authentik from nixpkgs. This removes the dependency on the nixpkgs packaging timeline and gives full version control over authentik releases. Chain: mikado/authentik-source-build Leaf nodes: authentik-api-client-generation, authentik-python-backend-derivation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:20:17 -08:00
Erich Blume	922265c88f	Add changelog fragment for grafana doc review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 07:28:47 -08:00
Erich Blume	8d1e98617b	Review build-grafana-container docs: stamp reviewed, fix cross-links Also fix stale grafana.md reference card (Helm → Kustomize). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 07:28:06 -08:00
Erich Blume	02eb169403	Pin blumeops-pg to PostgreSQL 18.3 Replace floating :18 tag with pinned :18.3 (upstream out-of-cycle release fixing 18.2 regressions). Stamps service as reviewed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 16:25:32 -08:00
Erich Blume	7cecaf0471	Review forgejo-runner docs: stamp reviewed, fix cross-links Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 15:10:20 -08:00
Erich Blume	776caa87f5	Sync Frigate zone coordinates from live API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 07:52:09 -08:00
Forgejo Actions	fa223f8e3b	Update docs release to v1.11.5 - Built changelog from towncrier fragments [skip ci]	2026-02-26 07:56:02 -08:00
Erich Blume	be3cdad1cb	Add HA for CV and Docs: zero-downtime deploys (#273 ) ## Summary - Set `replicas: 2` with `maxUnavailable: 0` / `maxSurge: 1` on CV and Docs deployments so rolling updates never drop below 2 ready pods - Add PodDisruptionBudgets (`minAvailable: 1`) to protect against node drains and cluster maintenance - Add Fly.io cache purge step to `cv-deploy.yaml` workflow (docs already had this) so CV deploys don't serve stale cached content ## Deployment and Testing - [ ] `argocd app diff cv` / `argocd app diff docs` from branch - [ ] Deploy from branch: `argocd app set cv --revision feature/ha-cv-docs-zero-downtime && argocd app sync cv` - [ ] Verify 2 pods running: `kubectl get pods -n cv --context=minikube-indri` - [ ] Test rolling restart: `kubectl rollout restart deployment/cv -n cv --context=minikube-indri` - [ ] During rollout, confirm continuous availability via `curl -I https://cv.eblu.me` - [ ] After merge: reset ArgoCD to main, re-sync both apps Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/273	2026-02-26 07:53:21 -08:00
Erich Blume	8e9d89ca73	docs-review: print file path instead of content for LLM usage The LLM should read the file itself using its tools rather than receiving it inline in the task output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:24:37 -08:00
Erich Blume	9a7acffa26	Review manage-forgejo-mirrors doc: clarify cron default, stamp reviewed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:17:18 -08:00
Erich Blume	fb83c5c577	Add explicit ExternalSecret defaults for SSA sync parity The external-secrets webhook injects conversionStrategy, decodingStrategy, and metadataPolicy defaults on admission. Declaring them explicitly prevents ArgoCD SSA from flagging the resource as OutOfSync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:02:54 -08:00
Erich Blume	db561c6b0e	Upgrade ArgoCD v3.2.6 → v3.3.2 with Server-Side Apply (#272 ) ## Summary - Upgrade ArgoCD from v3.2.6 to v3.3.2 - Enable `ServerSideApply=true` sync option (required by v3.3 — ApplicationSet CRD exceeds client-side apply annotation limit) - Update service-versions.yaml with review for argocd and 1password-connect ## Breaking changes reviewed - Server-Side Apply required: Added to syncOptions ✅ - Source Hydrator git notes: Not used — N/A - Application path cleaning removed: Not used — N/A - Settings API field restriction: Authenticated access only — N/A ## Deployment and Testing - [ ] Sync the `apps` app first (picks up SSA syncOption change) - [ ] `argocd app set argocd --revision feature/argocd-v3.3.2` - [ ] `argocd app sync argocd` - [ ] Verify all argocd pods running with v3.3.2 images - [ ] Verify other apps still sync correctly - [ ] After merge: `argocd app set argocd --revision main && argocd app sync argocd` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/272	2026-02-26 06:51:50 -08:00
Erich Blume	95c8424e62	Add Transmission metrics exporter and Grafana dashboard (#271 ) ## Summary - Add `metalmatze/transmission-exporter` as a sidecar container in the torrent deployment, exposing Prometheus metrics on port 19091 - Add metrics port to the torrent service for Prometheus scraping - Add Prometheus scrape job targeting the transmission exporter - Create Grafana dashboard with: - Overview stats (download/upload speed, active/total torrents) - Transfer speed timeseries (download + upload over time) - Transfer volume stats (total downloaded/uploaded in selected range) - Per-torrent download and upload rate timeseries - Per-torrent details table (ratio, uploaded, percent done) ## Deployment and Testing - [ ] Sync ArgoCD `torrent` app from branch — verify exporter sidecar starts - [ ] Verify exporter metrics: `kubectl exec` into pod, `curl localhost:19091/metrics` - [ ] Verify Prometheus scrapes it: check targets at prometheus.ops.eblu.me - [ ] Open Grafana, find "Transmission" dashboard, verify panels populate - [ ] Sync ArgoCD `prometheus` app from branch - [ ] Sync ArgoCD `grafana-config` app from branch Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/271	2026-02-25 22:23:33 -08:00
Erich Blume	03d71544ec	Add multi-cluster observability with ringtail metrics and dashboards (#270 ) ## Summary - Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs - Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests) - Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki - Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with: - Kubernetes Clusters dashboard — multi-cluster with `cluster` and `namespace` template variables - Ringtail (k3s) dashboard — dedicated ringtail view with GPU usage panels ## Deployment and Testing 1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`) 2. Sync `prometheus` → verify `cluster` label on scraped metrics 3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs 4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs 5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail 6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}` 7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values 8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods ## Notes - Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later - DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve - The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270	2026-02-25 22:01:00 -08:00
Erich Blume	2243f2e0a1	Filter driveway zone to person/dog/cat only in Frigate Parked car was being re-detected every few minutes at night due to IR illumination noise triggering motion detection. Restrict the driveway zone to [person, dog, cat] so cars and birds no longer create events there. Cars still alert via the driveway_entrance zone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 20:45:07 -08:00
Erich Blume	84338c32c2	Add authenticated GitHub PAT for Forgejo mirror sync (#269 ) ## Summary - mirror-create: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time - mirror-update-pats: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run` - app.ini.j2: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility) - manage-forgejo-mirrors: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure ## Context GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors. ## Deployment and Testing - [x] `mirror-update-pats` dry-run verified (23 mirrors detected) - [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri - [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped) - [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config - [ ] Trigger a manual mirror sync and verify success in Forgejo UI Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269	2026-02-25 20:20:23 -08:00
Erich Blume	23dc79058e	Bake default display options into ai-docs mise task The --style=header --color=never --decorations=always flags are now built into the script so callers can just run `mise run ai-docs`. Also adds a note to CLAUDE.md to never truncate the output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:42:47 -08:00
Erich Blume	de54b4e33d	Port CloudNative-PG off Helm to direct release manifest (#268 ) ## Summary - Point ArgoCD app directly at forge-mirrored upstream repo (`mirrors/cloudnative-pg`) instead of the Helm charts repo - Use `directory.include` to select the specific release manifest (`cnpg-1.27.1.yaml`) from the `releases/` directory - No vendored files, no Helm — upgrades are a two-line change (`targetRevision` + `directory.include`) - Delete unused `values.yaml` (was empty, all Helm defaults) ## Deployment and Testing - [ ] Register mirror repo in ArgoCD: `argocd repo add ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git --ssh-private-key-path <key>` - [ ] `argocd app set cloudnative-pg --revision feature/cnpg-direct-source && argocd app sync cloudnative-pg` - [ ] Verify operator pod running: `kubectl get pods -n cnpg-system --context=minikube-indri` - [ ] Verify CRDs exist: `kubectl get crd --context=minikube-indri \| grep cnpg` - [ ] Verify existing clusters healthy: `kubectl get clusters -A --context=minikube-indri` - [ ] After merge: `argocd app set cloudnative-pg --revision main && argocd app sync cloudnative-pg` ## Notes - The forge mirror was created via `mise run mirror-create` from `https://github.com/cloudnative-pg/cloudnative-pg.git` - ArgoCD may need the mirror repo added to its known repositories if the credential template doesn't already match `mirrors/*` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/268	2026-02-25 17:37:53 -08:00
Erich Blume	285ad4141f	Fix Frigate detection events rate metric name in Grafana dashboard The panel queried frigate_camera_events but the actual metric exposed by Frigate is frigate_camera_events_total with a "camera" label (not "camera_name"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 16:51:57 -08:00

1 2 3 4 5 ...

311 commits