blumeops

Author	SHA1	Message	Date
Erich Blume	7b0f642066	Exclude upstream placeholder OAuth Secret from kustomize build The upstream manifest includes a Secret with empty client_id/client_secret placeholders. We manage this via ExternalSecret, so drop the upstream copy to avoid ownership conflicts in ArgoCD. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:43:53 -07:00
Erich Blume	2bc0852680	Switch to kustomize remote resource for upstream manifest Use HTTPS raw URL from forge mirror instead of a separate ArgoCD app. Pins operator image to v1.94.2 via kustomize images transformer, avoiding the upstream's floating "stable" tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:42:34 -07:00
Erich Blume	56224867fa	Externalize Tailscale operator to forge mirror Replace vendored operator.yaml (495 KB) with ArgoCD apps sourcing the upstream static manifest from mirrors/tailscale on forge, pinned to v1.94.2 via targetRevision. Adds apps for both indri and ringtail clusters. Local kustomization retains only ProxyClass and DNSConfig. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:33:32 -07:00
Forgejo Actions	cb95db0bc9	Update docs release to v1.14.1 - Built changelog from towncrier fragments [skip ci]	2026-03-14 10:11:06 -07:00
Erich Blume	ab8ea6f301	Bump Grafana Alloy to v1.14.0 (#292 ) ## Summary - Bump alloy-k8s, alloy-ringtail, and alloy-tracing-ringtail image tags from v1.13.1 to v1.14.0 - Mark indri alloy (ansible) as reviewed at v1.14.0 — source rebuild from forge mirror needed - Add missing alloy-ringtail entry to service-versions.yaml - Update alloy reference doc ## Breaking changes reviewed - `loki.secretfilter` options removed — not used in our configs - OTel Collector upgraded to v0.142.0 — Kafka receiver changes don't affect us - Exporter queue default changes — our tracing pipeline (Beyla → batch → otlphttp) uses simple config, low risk ## Deployment and Testing - [ ] Sync alloy-k8s: `argocd app set alloy-k8s --revision bump/alloy-v1.14.0 && argocd app sync alloy-k8s` - [ ] Sync alloy-ringtail: `argocd app set alloy-ringtail --revision bump/alloy-v1.14.0 --server ringtail-argocd && argocd app sync alloy-ringtail` - [ ] Sync alloy-tracing-ringtail similarly - [ ] Verify metrics flowing in Grafana - [ ] Verify traces flowing to Tempo (ringtail) - [ ] Rebuild indri alloy from source (`v1.14.0` tag on forge mirror), SCP to indri, restart - [ ] After merge: reset ArgoCD revisions to main, re-sync Reviewed-on: #292	2026-03-13 16:25:27 -07:00
Erich Blume	c26026f4e9	Bump Ollama memory to 24Gi and enable flash attention The 27B Q4_K_M model needs ~7.3 GiB system RAM for CPU-offloaded layers but only 6.8 GiB was available within the 22Gi cgroup. Bumping to 24Gi and enabling flash attention (reduces KV cache memory) should provide enough headroom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 20:33:22 -07:00
Erich Blume	6d4929a66c	Add qwen3.5:27b to Ollama and bump memory limit to 22Gi The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080 by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory limit needs headroom beyond the previous 16Gi. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:55:51 -07:00
Erich Blume	40f1568088	Remove unused Mosquitto MQTT broker from ringtail Mosquitto has been dormant since frigate-notify switched from MQTT to webapi polling (`529ba10`). Tear down live infra (ArgoCD app, namespace) and remove all manifests, service-versions entry, services-check, and doc references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:37:31 -07:00
Erich Blume	87d4de244b	Review jobsync: add to services-check and homepage (#291 ) ## Summary - Add jobsync pod check (ringtail k3s) and HTTP endpoint to `services-check` - Add JobSync entry to homepage dashboard under new "Apps" group - Mark jobsync as reviewed at v1.1.4 (current with upstream) - Changelog fragment added ## Deployment and Testing - [ ] Sync homepage app from branch: `argocd app set homepage --revision review/jobsync && argocd app sync homepage` - [ ] Verify JobSync appears on go.ops.eblu.me dashboard - [ ] Run `mise run services-check` to verify new checks pass - [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage` Reviewed-on: #291	2026-03-11 17:36:51 -07:00
Forgejo Actions	ebba3d6e5b	Update docs release to v1.14.0 - Built changelog from towncrier fragments [skip ci]	2026-03-09 12:03:30 -07:00
Erich Blume	0ef5fe5792	Update docs container to v1.28.2-4f0476a (SPA disabled) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 12:00:54 -07:00
Erich Blume	953640d2b7	Deploy docs with fixed robots.txt (v1.28.2-ede9a51) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 20:21:05 -07:00
Erich Blume	770a7b2d6a	Add JobSync reference card, observability docs, and RAPIDAPI_KEY plumbing (#289 ) ## Summary - Add JobSync service reference card (`docs/reference/services/jobsync.md`) with architecture, secrets, observability, and JSearch API docs - Add JobSync and Ollama to ringtail's workloads table (both were missing) - Add JobSync to the reference index - Wire `RAPIDAPI_KEY` through ExternalSecret and deployment env var for JSearch job search automation - Document Loki log queries for observability (no metrics endpoint exists) - Update deploy-jobsync how-to with new env var, observability section, and reference card link ## Deployment and Testing - [ ] Sign up for RapidAPI JSearch API (free tier: 500 req/month) - [ ] Add `rapidapi_key` field to "JobSync" 1Password item - [ ] Merge PR - [ ] `argocd app sync jobsync` to pick up new env var - [ ] Verify job search works at https://jobsync.ops.eblu.me/dashboard/automations Reviewed-on: #289	2026-03-08 15:06:52 -07:00
Erich Blume	c9270c7645	Update jobsync image to v1.1.4-3a811fb-nix (main build) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 11:13:34 -07:00
Erich Blume	3a811fb188	Deploy JobSync — job search tracker on ringtail k3s (#288 ) All checks were successful Build Container (Nix) / detect (push) Successful in 1s Details Build Container / detect (push) Successful in 2s Details Build Container / build (jobsync) (push) Successful in 2s Details Build Container (Nix) / build (jobsync) (push) Successful in 8s Details ## Summary C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster. ### Mikado Graph ``` deploy-jobsync (goal) ├── build-jobsync-container │ └── mirror-jobsync └── integrate-jobsync-ollama ``` ### What is JobSync? Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching. ### Key Decisions - Ringtail k3s (not minikube-indri) — colocates with Ollama for zero-latency AI - Nix container via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge - Ollama for AI — uses existing deployment, no API keys needed for AI features - No upstream fork — vanilla JobSync, Anthropic AI deferred to future work if needed ### Current Status Planning phase — cards committed, ready for review before implementation begins. Reviewed-on: #288	2026-03-08 11:02:05 -07:00
Erich Blume	14e931591b	Fix 1Password Connect numeric log levels misclassified in Grafana (#287 ) ## Summary - 1Password Connect uses non-standard numeric log levels (`1`=error, `2`=warn, `3`=info, `4`=debug, `5`=trace) per [1Password/connect#44](https://github.com/1Password/connect/issues/44) - Alloy extracts the `level` JSON field as-is, so info-level health checks get `level="3"` in Loki - Grafana expects string level labels — numeric values are unrecognized, causing misclassified log severity/coloring - Adds a `stage.match` + `stage.template` in the Alloy pipeline scoped to `{namespace="1password"}` to normalize numeric levels to standard strings - Other services are completely unaffected (scoped by namespace, not global) ## Deployment and Testing - [ ] Sync alloy-k8s from branch: `argocd app set alloy-k8s --revision fix/onepassword-numeric-log-levels && argocd app sync alloy-k8s` - [ ] Wait ~2 minutes for new logs to flow - [ ] Verify level labels: `curl -sG "http://localhost:3100/loki/api/v1/label/level/values" --data-urlencode 'query={namespace="1password"}'` should show `"info"` and `"warn"` instead of `"3"` and `"2"` - [ ] Check Grafana log panel for 1password namespace — logs should no longer appear as errors - [ ] After merge: `argocd app set alloy-k8s --revision main && argocd app sync alloy-k8s` Reviewed-on: #287	2026-03-07 13:57:04 -08:00
Erich Blume	590cb1d25d	Document required preview directory for Frigate NFS volume Frigate 0.17 does not auto-create clips/previews/<camera>/, causing review page previews to silently fail with 500 errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 08:46:23 -08:00
Forgejo Actions	2809ba6f50	Update docs release to v1.13.3 - Built changelog from towncrier fragments [skip ci]	2026-03-06 20:49:01 -08:00
Erich Blume	b793299d6d	Upgrade Dagger engine from v0.20.0 to v0.20.1 Phase 2 of Dagger upgrade: bump engine version, update runner deployment to v0.20.1-24f7512, and fix docs reference card version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 20:41:02 -08:00
Forgejo Actions	e95fb9a555	Update docs release to v1.13.2 - Built changelog from towncrier fragments [skip ci]	2026-03-06 19:03:24 -08:00
Erich Blume	a7c21bd8a6	Update docs quartz container to v1.28.2-b64010b Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 18:58:40 -08:00
Forgejo Actions	8b0ff3d7a5	Update docs release to v1.13.1 - Built changelog from towncrier fragments [skip ci]	2026-03-06 10:00:42 -08:00
Erich Blume	1537412c09	Update docs quartz container to v1.28.2-6636576 Picks up spider-trap nginx guards from `6636576`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 09:52:31 -08:00
Erich Blume	6e8d11c6bb	Add :kustomized sentinel tag to manifest images, review devpi Bare image references in manifests were ambiguous — unclear whether the tag was intentionally omitted or managed by kustomize. Add :kustomized sentinel to all 37 image refs overridden by kustomize images transformer. Add sync notes for tailscale-operator proxyclass (CRD fields not processed by kustomize). Mark devpi reviewed (6.19.1 is current). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 08:15:06 -08:00
Forgejo Actions	d98ef984ea	Update docs release to v1.13.0 - Built changelog from towncrier fragments [skip ci]	2026-03-05 11:11:38 -08:00
Erich Blume	46cc3fbc2e	Update forgejo-runner job image to v0.20.0-448689b Built locally to break the chicken-and-egg: the old runner couldn't build its own replacement because it needed Dagger 0.20.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 11:05:21 -08:00
Erich Blume	c281fb5403	Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286 ) ## Summary Adds the third observability pillar — distributed tracing — alongside existing metrics (Prometheus) and logs (Loki). - Grafana Tempo 2.10.1 on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - Beyla eBPF auto-instrumentation via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - Grafana integration — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - Prometheus scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: #286	2026-03-05 10:51:07 -08:00
Erich Blume	7bddc78c8a	Add ExternalSecret default fields to prevent ArgoCD drift The external-secrets operator adds conversionStrategy, decodingStrategy, and metadataPolicy defaults to the live object, causing perpetual OutOfSync in ArgoCD. Declare them explicitly to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 09:11:23 -08:00
Erich Blume	405fc59c12	Add Authentik OIDC login for ArgoCD (#284 ) ## Summary - Add Authentik OAuth2 provider + application blueprint for ArgoCD (ringtail side) - Add OIDC config to ArgoCD ConfigMap with Authentik as identity provider (indri side) - Map Authentik `admins` group to ArgoCD `role:admin` via RBAC policy - ExternalSecrets on both sides pull `argocd-client-secret` from 1Password - Local admin password remains as break-glass — both login methods coexist ## Pre-deployment manual step Add `argocd-client-secret` field to "Authentik (blumeops)" in 1Password with a random value (e.g., `openssl rand -hex 32`). ## Deployment order 1. Sync Authentik app on ringtail first (blueprint + secret + worker env var) 2. Sync ArgoCD app on indri second (cm, rbac, ExternalSecret) ## Verification - [ ] `argocd-client-secret` field added to 1Password - [ ] Authentik app synced on ringtail — blueprint applied, provider created - [ ] ArgoCD app synced on indri — OIDC config applied - [ ] SSO login works: visit `https://argocd.ops.eblu.me` → "Log in via Authentik" → admin access - [ ] Break-glass: local admin/password login still works Reviewed-on: #284	2026-03-05 09:07:25 -08:00
Erich Blume	91c755ddd6	Pin kiwix-serve image tag to v3.8.2-f6f0f79 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 08:17:40 -08:00
Erich Blume	75814e032c	Pin transmission-exporter image tag to v1.0.1-c93448f Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 08:05:17 -08:00
Erich Blume	797133b28e	Fix per-torrent rate panels showing cumulative bytes instead of rates All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s Details Build Container / build (transmission-exporter) (push) Successful in 38s Details Dashboard "Download/Upload Rate by Torrent" panels were querying transmission_torrent_download_bytes (total_size * percent_done) and transmission_torrent_upload_bytes (uploaded_ever) — cumulative byte gauges, not rates. Added new metrics using Transmission's native rate_download/rate_upload and updated dashboard queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 08:01:37 -08:00
Erich Blume	6ae18cde1e	Pin transmission-exporter image tag to v1.0.0-f2704b2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 21:55:59 -08:00
Erich Blume	f2704b26da	Replace transmission-exporter with homegrown Python exporter (#283 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s Details Build Container / build (transmission-exporter) (push) Successful in 19s Details ## Summary - Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter - Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels) - Same metric names so existing Grafana Transmission dashboard works unchanged - Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern ## Changes - New: `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines) - New: `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv - Modified: `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference - Modified: `argocd/manifests/torrent/kustomization.yaml` — add image tag entry - Modified: `service-versions.yaml` — add transmission-exporter entry ## Deployment and Testing - [ ] Build container: `mise run container-build-and-release transmission-exporter` - [ ] Update kustomization.yaml newTag with build SHA - [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent` - [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics \| grep transmission_` - [ ] Verify Grafana Transmission dashboard panels populate - [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent` Reviewed-on: #283	2026-03-04 21:55:00 -08:00
Erich Blume	91d84e54d5	Replace OOMKilled stat with detail table, shrink waiting reason panel The count-only stat wasn't actionable. New table shows pod name, container, restart count, and memory limit for each OOMKilled container. Waiting reason panel narrowed to make room. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:58:11 -08:00
Erich Blume	008da43736	Add OOMKill observability to Kubernetes Clusters dashboard OOMKilled containers previously only appeared briefly in "Unhealthy Pods" while dying, then vanished on restart. New panels use persistent metrics (last_terminated_reason) and restart rate tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:53:07 -08:00
Erich Blume	e90c287504	Add qwen3.5:9b to Ollama model list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 19:49:39 -08:00
Erich Blume	b460333da0	Upgrade Transmission to 4.1.1 (#282 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container (Nix) / detect (push) Successful in 2s Details Build Container (Nix) / build (transmission) (push) Successful in 2s Details Build Container / build (transmission) (push) Successful in 6s Details ## Summary - Upgrade Transmission from 4.0.6-r4 to 4.1.1-r1 - Uses Alpine edge community repo for transmission packages, keeping stable alpine:3.22 base - Fix stale image reference in service doc (was linuxserver, now custom registry image) - Mark transmission as reviewed in service-versions.yaml ## Context Service review found Transmission two minor versions behind (4.0.6 → 4.1.1). Alpine 3.22 only packages 4.0.6, so transmission is installed from edge's community repo with an exact version pin. 4.1.0 added improved µTP performance, IPv6/dual-stack UDP tracker, JSON-RPC 2.0 API. 4.1.1 is a bugfix release (20+ fixes). Dagger test build passed locally. ## Deployment and Testing - [ ] Build container via Forgejo workflow (`mise run container-build-and-release transmission`) - [ ] Update kustomization.yaml with new image tag - [ ] `argocd app set torrent --revision feature/transmission-review && argocd app sync torrent` - [ ] Verify web UI at https://torrent.ops.eblu.me - [ ] Check Grafana Transmission dashboard still receives metrics - [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent` ## Note The transmission-exporter sidecar (OOMKilling every ~30min, 294 restarts) is being tracked separately as a future replacement project. Reviewed-on: #282	2026-03-04 07:44:33 -08:00
Erich Blume	d7f0aa6f96	Fix Frigate database path to use persistent volume The database was at /config/frigate.db (emptyDir, ephemeral) instead of /db/frigate.db (PVC, persistent). Every pod restart wiped the database, losing all recording history and leaving orphaned files on NFS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 15:18:16 -08:00
Erich Blume	135883079c	Bump frigate memory limit from 2Gi to 3Gi ONNX detector + CUDA ffmpeg + workers consume ~1.9Gi at steady state, causing intermittent OOMKills at the 2Gi limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:57:15 -08:00
Erich Blume	3d065b94f9	Pin grafana-sidecar to main build tag v1.28.0-a2bb9ab (built from merge commit on main). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:51:01 -08:00
Erich Blume	a2bb9abbdb	Home-build grafana-sidecar container (#281 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s Details Build Container / build (grafana-sidecar) (push) Successful in 6s Details ## Summary - Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0` - Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop - Adds Dockerfile, service-versions entry, docs, and changelog fragment - Manifest switch to home-built image pending container build ## Deployment and Testing - [ ] `mise run container-build-and-release grafana-sidecar` - [ ] Update kustomization.yaml with built image tag - [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana` - [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me - [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana` Reviewed-on: #281	2026-03-03 13:48:24 -08:00
Erich Blume	876e51dd77	Allow implicit octals in yamllint and normalize k8s mode values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:10:44 -08:00
Erich Blume	eceea2126b	Add Gandi bookmark to homepage dashboard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:05:50 -08:00
Erich Blume	51626e6630	Update Loki to v3.6.5-3dc4ed7 container image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:01:49 -08:00
Erich Blume	3dc4ed730b	Build Loki container image locally (#280 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (loki) (push) Successful in 2s Details Build Container / build (loki) (push) Successful in 7s Details ## Summary - Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/` - Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki` - Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build ## Details - UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup) - CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build` - Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session) - Pattern follows miniflux (two-stage Go) + prometheus (ldflags) ## Deployment and Testing - [ ] Trigger container build: `mise run container-build-and-release loki` - [ ] Update kustomize tag to actual build tag - [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki` - [ ] Verify `/ready` endpoint and log ingestion - [ ] After merge: update to `[main]` tag (C0 follow-up) Reviewed-on: #280	2026-03-03 13:00:43 -08:00
Erich Blume	f914a14653	Update teslamate to v3.0.0-eb9bc57 container image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 12:02:26 -08:00
Erich Blume	01d3b4d1c7	Switch forgejo-runner ArgoCD app to internal SSH repo URL Was the only app still using https://forge.eblu.me (public proxy) for git polling. All other apps already use the internal SSH endpoint at forge.ops.eblu.me. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:43:01 -08:00
Erich Blume	82884436df	Route runner polling through internal forge.ops.eblu.me The k8s and ringtail runners were hitting forge.eblu.me (fly.io proxy) for every FetchTask poll (~every 2s), round-tripping through the public internet unnecessarily. Use forge.ops.eblu.me (Caddy on indri, tailnet) for infrastructure workloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:33:40 -08:00
Erich Blume	7b68be2e80	Add fly.io proxy observability and app logs to Forgejo dashboard Rename "Forgejo Repository Health" to "Forgejo" and add proxy metrics (request rate, error rate, RPS, latency, bandwidth), proxy access logs, and Forgejo application logs from Loki. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:24:53 -08:00

1 2 3 4 5 ...

283 commits