Commit graph

269 commits

Author SHA1 Message Date
3a811fb188 Deploy JobSync — job search tracker on ringtail k3s (#288)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (jobsync) (push) Successful in 2s
Build Container (Nix) / build (jobsync) (push) Successful in 8s
## Summary

C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.

### Mikado Graph

```
deploy-jobsync (goal)
├── build-jobsync-container
│   └── mirror-jobsync
└── integrate-jobsync-ollama
```

### What is JobSync?

Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.

### Key Decisions

- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed

### Current Status

Planning phase — cards committed, ready for review before implementation begins.

Reviewed-on: #288
2026-03-08 11:02:05 -07:00
14e931591b Fix 1Password Connect numeric log levels misclassified in Grafana (#287)
## Summary
- 1Password Connect uses non-standard numeric log levels (`1`=error, `2`=warn, `3`=info, `4`=debug, `5`=trace) per [1Password/connect#44](https://github.com/1Password/connect/issues/44)
- Alloy extracts the `level` JSON field as-is, so info-level health checks get `level="3"` in Loki
- Grafana expects string level labels — numeric values are unrecognized, causing misclassified log severity/coloring
- Adds a `stage.match` + `stage.template` in the Alloy pipeline scoped to `{namespace="1password"}` to normalize numeric levels to standard strings
- Other services are completely unaffected (scoped by namespace, not global)

## Deployment and Testing
- [ ] Sync alloy-k8s from branch: `argocd app set alloy-k8s --revision fix/onepassword-numeric-log-levels && argocd app sync alloy-k8s`
- [ ] Wait ~2 minutes for new logs to flow
- [ ] Verify level labels: `curl -sG "http://localhost:3100/loki/api/v1/label/level/values" --data-urlencode 'query={namespace="1password"}'` should show `"info"` and `"warn"` instead of `"3"` and `"2"`
- [ ] Check Grafana log panel for 1password namespace — logs should no longer appear as errors
- [ ] After merge: `argocd app set alloy-k8s --revision main && argocd app sync alloy-k8s`

Reviewed-on: #287
2026-03-07 13:57:04 -08:00
590cb1d25d Document required preview directory for Frigate NFS volume
Frigate 0.17 does not auto-create clips/previews/<camera>/, causing
review page previews to silently fail with 500 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:46:23 -08:00
Forgejo Actions
2809ba6f50 Update docs release to v1.13.3
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 20:49:01 -08:00
b793299d6d Upgrade Dagger engine from v0.20.0 to v0.20.1
Phase 2 of Dagger upgrade: bump engine version, update runner
deployment to v0.20.1-24f7512, and fix docs reference card version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:41:02 -08:00
Forgejo Actions
e95fb9a555 Update docs release to v1.13.2
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 19:03:24 -08:00
a7c21bd8a6 Update docs quartz container to v1.28.2-b64010b
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:58:40 -08:00
Forgejo Actions
8b0ff3d7a5 Update docs release to v1.13.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 10:00:42 -08:00
1537412c09 Update docs quartz container to v1.28.2-6636576
Picks up spider-trap nginx guards from 6636576.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:52:31 -08:00
6e8d11c6bb Add :kustomized sentinel tag to manifest images, review devpi
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 08:15:06 -08:00
Forgejo Actions
d98ef984ea Update docs release to v1.13.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-05 11:11:38 -08:00
46cc3fbc2e Update forgejo-runner job image to v0.20.0-448689b
Built locally to break the chicken-and-egg: the old runner couldn't
build its own replacement because it needed Dagger 0.20.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 11:05:21 -08:00
c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
7bddc78c8a Add ExternalSecret default fields to prevent ArgoCD drift
The external-secrets operator adds conversionStrategy, decodingStrategy,
and metadataPolicy defaults to the live object, causing perpetual
OutOfSync in ArgoCD. Declare them explicitly to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 09:11:23 -08:00
405fc59c12 Add Authentik OIDC login for ArgoCD (#284)
## Summary
- Add Authentik OAuth2 provider + application blueprint for ArgoCD (ringtail side)
- Add OIDC config to ArgoCD ConfigMap with Authentik as identity provider (indri side)
- Map Authentik `admins` group to ArgoCD `role:admin` via RBAC policy
- ExternalSecrets on both sides pull `argocd-client-secret` from 1Password
- Local admin password remains as break-glass — both login methods coexist

## Pre-deployment manual step
Add `argocd-client-secret` field to "Authentik (blumeops)" in 1Password with a random value (e.g., `openssl rand -hex 32`).

## Deployment order
1. Sync Authentik app on ringtail first (blueprint + secret + worker env var)
2. Sync ArgoCD app on indri second (cm, rbac, ExternalSecret)

## Verification
- [ ] `argocd-client-secret` field added to 1Password
- [ ] Authentik app synced on ringtail — blueprint applied, provider created
- [ ] ArgoCD app synced on indri — OIDC config applied
- [ ] SSO login works: visit `https://argocd.ops.eblu.me` → "Log in via Authentik" → admin access
- [ ] Break-glass: local admin/password login still works

Reviewed-on: #284
2026-03-05 09:07:25 -08:00
91c755ddd6 Pin kiwix-serve image tag to v3.8.2-f6f0f79
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:17:40 -08:00
75814e032c Pin transmission-exporter image tag to v1.0.1-c93448f
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:05:17 -08:00
797133b28e Fix per-torrent rate panels showing cumulative bytes instead of rates
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 38s
Dashboard "Download/Upload Rate by Torrent" panels were querying
transmission_torrent_download_bytes (total_size * percent_done) and
transmission_torrent_upload_bytes (uploaded_ever) — cumulative byte
gauges, not rates. Added new metrics using Transmission's native
rate_download/rate_upload and updated dashboard queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:01:37 -08:00
6ae18cde1e Pin transmission-exporter image tag to v1.0.0-f2704b2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 21:55:59 -08:00
f2704b26da Replace transmission-exporter with homegrown Python exporter (#283)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 19s
## Summary
- Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter
- Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels)
- Same metric names so existing Grafana Transmission dashboard works unchanged
- Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern

## Changes
- **New:** `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines)
- **New:** `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv
- **Modified:** `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference
- **Modified:** `argocd/manifests/torrent/kustomization.yaml` — add image tag entry
- **Modified:** `service-versions.yaml` — add transmission-exporter entry

## Deployment and Testing
- [ ] Build container: `mise run container-build-and-release transmission-exporter`
- [ ] Update kustomization.yaml newTag with build SHA
- [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent`
- [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics | grep transmission_`
- [ ] Verify Grafana Transmission dashboard panels populate
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

Reviewed-on: #283
2026-03-04 21:55:00 -08:00
91d84e54d5 Replace OOMKilled stat with detail table, shrink waiting reason panel
The count-only stat wasn't actionable. New table shows pod name, container,
restart count, and memory limit for each OOMKilled container. Waiting reason
panel narrowed to make room.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:58:11 -08:00
008da43736 Add OOMKill observability to Kubernetes Clusters dashboard
OOMKilled containers previously only appeared briefly in "Unhealthy Pods"
while dying, then vanished on restart. New panels use persistent metrics
(last_terminated_reason) and restart rate tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:53:07 -08:00
e90c287504 Add qwen3.5:9b to Ollama model list
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 19:49:39 -08:00
b460333da0 Upgrade Transmission to 4.1.1 (#282)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (transmission) (push) Successful in 2s
Build Container / build (transmission) (push) Successful in 6s
## Summary
- Upgrade Transmission from 4.0.6-r4 to 4.1.1-r1
- Uses Alpine edge community repo for transmission packages, keeping stable alpine:3.22 base
- Fix stale image reference in service doc (was linuxserver, now custom registry image)
- Mark transmission as reviewed in service-versions.yaml

## Context
Service review found Transmission two minor versions behind (4.0.6 → 4.1.1). Alpine 3.22 only packages 4.0.6, so transmission is installed from edge's community repo with an exact version pin.

4.1.0 added improved µTP performance, IPv6/dual-stack UDP tracker, JSON-RPC 2.0 API. 4.1.1 is a bugfix release (20+ fixes).

Dagger test build passed locally.

## Deployment and Testing
- [ ] Build container via Forgejo workflow (`mise run container-build-and-release transmission`)
- [ ] Update kustomization.yaml with new image tag
- [ ] `argocd app set torrent --revision feature/transmission-review && argocd app sync torrent`
- [ ] Verify web UI at https://torrent.ops.eblu.me
- [ ] Check Grafana Transmission dashboard still receives metrics
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

## Note
The transmission-exporter sidecar (OOMKilling every ~30min, 294 restarts) is being tracked separately as a future replacement project.

Reviewed-on: #282
2026-03-04 07:44:33 -08:00
d7f0aa6f96 Fix Frigate database path to use persistent volume
The database was at /config/frigate.db (emptyDir, ephemeral) instead of
/db/frigate.db (PVC, persistent). Every pod restart wiped the database,
losing all recording history and leaving orphaned files on NFS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:18:16 -08:00
135883079c Bump frigate memory limit from 2Gi to 3Gi
ONNX detector + CUDA ffmpeg + workers consume ~1.9Gi at steady state,
causing intermittent OOMKills at the 2Gi limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:57:15 -08:00
3d065b94f9 Pin grafana-sidecar to main build tag
v1.28.0-a2bb9ab (built from merge commit on main).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:51:01 -08:00
a2bb9abbdb Home-build grafana-sidecar container (#281)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s
Build Container / build (grafana-sidecar) (push) Successful in 6s
## Summary
- Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0`
- Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop
- Adds Dockerfile, service-versions entry, docs, and changelog fragment
- Manifest switch to home-built image pending container build

## Deployment and Testing
- [ ] `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization.yaml with built image tag
- [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana`
- [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me
- [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana`

Reviewed-on: #281
2026-03-03 13:48:24 -08:00
876e51dd77 Allow implicit octals in yamllint and normalize k8s mode values
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:10:44 -08:00
eceea2126b Add Gandi bookmark to homepage dashboard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:05:50 -08:00
51626e6630 Update Loki to v3.6.5-3dc4ed7 container image
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:01:49 -08:00
3dc4ed730b Build Loki container image locally (#280)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (loki) (push) Successful in 2s
Build Container / build (loki) (push) Successful in 7s
## Summary
- Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/`
- Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki`
- Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build

## Details
- UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup)
- CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build`
- Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session)
- Pattern follows miniflux (two-stage Go) + prometheus (ldflags)

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release loki`
- [ ] Update kustomize tag to actual build tag
- [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki`
- [ ] Verify `/ready` endpoint and log ingestion
- [ ] After merge: update to `[main]` tag (C0 follow-up)

Reviewed-on: #280
2026-03-03 13:00:43 -08:00
f914a14653 Update teslamate to v3.0.0-eb9bc57 container image
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 12:02:26 -08:00
01d3b4d1c7 Switch forgejo-runner ArgoCD app to internal SSH repo URL
Was the only app still using https://forge.eblu.me (public proxy) for
git polling. All other apps already use the internal SSH endpoint at
forge.ops.eblu.me.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:43:01 -08:00
82884436df Route runner polling through internal forge.ops.eblu.me
The k8s and ringtail runners were hitting forge.eblu.me (fly.io proxy)
for every FetchTask poll (~every 2s), round-tripping through the public
internet unnecessarily. Use forge.ops.eblu.me (Caddy on indri, tailnet)
for infrastructure workloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:33:40 -08:00
7b68be2e80 Add fly.io proxy observability and app logs to Forgejo dashboard
Rename "Forgejo Repository Health" to "Forgejo" and add proxy metrics
(request rate, error rate, RPS, latency, bandwidth), proxy access logs,
and Forgejo application logs from Loki.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:24:53 -08:00
86a0dee000 Remove ollama LAN NodePort service
The sanctioned ingress is ollama.ops.eblu.me via tailnet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:00:05 -08:00
3af346f1cd Move ollama LAN NodePort to port 80
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:37:50 -08:00
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
a32c99a252 Limit ollama to one loaded model and one parallel request
Prevents OOM when switching between models — only one 14B model
fits in 16GB VRAM at a time with KV cache for context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:23:12 -08:00
203e3cd567 Add NodePort service for ollama LAN access
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:57:18 -08:00
31d925814f Deploy Ollama LLM server on ringtail (#277)
## Summary
- Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration
- Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern)
- Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b`
- hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi)
- Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet
- Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080

## Deployment and Testing
- [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin`
- [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2`
- [ ] Sync `apps` app with `--revision feature/ollama-ringtail`
- [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama`
- [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail`
- [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags`
- [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'`
- [ ] Verify Frigate still works after GPU sharing change
- [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277
2026-03-02 20:39:51 -08:00
Forgejo Actions
0f79c61c42 Update docs release to v1.12.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-02 18:17:07 -08:00
Forgejo Actions
847e47eaf3 Update docs release to v1.12.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-01 17:24:09 -08:00
503775085d Deploy authentik 2026.2.0 with migration ordering fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:32:10 -08:00
90621e4155 Deploy authentik 2026.2.0 with entry_points fix
Update image tag to v2026.2.0-78027eb-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:04:29 -08:00
e2c650b027 Deploy authentik 2026.2.0 with BASE_DIR fix
Update image tag to v2026.2.0-e49d966-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:55:50 -08:00
c0e29476f3 Deploy authentik 2026.2.0 with TMPDIR fix
Update image tag to v2026.2.0-b7bfb0b-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:53:09 -08:00
38da372f94 Deploy authentik 2026.2.0 with /tmp fix
Update image tag to v2026.2.0-2ac353b-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:51:17 -08:00
098f3e517c Deploy authentik 2026.2.0 (source-built) to ArgoCD
Update image tag to v2026.2.0-efa9806-nix — the first source-built
authentik container from the build-authentik-from-source chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:44:35 -08:00