Commit graph

332 commits

Author SHA1 Message Date
e47a3b2ebb Review and update review-documentation.md
- Add last-reviewed date
- Replace raw pulumi commands with mise task equivalents
- Reference C0/C1/C2 change classification for making changes
- Note that prek handles link validation automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:59:51 -08:00
Forgejo Actions
2809ba6f50 Update docs release to v1.13.3
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 20:49:01 -08:00
55013db124 Add changelog fragment for Dagger v0.20.1 upgrade
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:42:00 -08:00
b793299d6d Upgrade Dagger engine from v0.20.0 to v0.20.1
Phase 2 of Dagger upgrade: bump engine version, update runner
deployment to v0.20.1-24f7512, and fix docs reference card version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:41:02 -08:00
ba7236ade0 Add how-to guide for upgrading Dagger
Documents the correct two-phase upgrade procedure to avoid the
chicken-and-egg problem where CI can't build its own replacement.
Also fixes outdated version references in the Dagger reference card.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:31:30 -08:00
Forgejo Actions
e95fb9a555 Update docs release to v1.13.2
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 19:03:24 -08:00
b64010b3c7 Replace spider-trap nginx 404s with robots.txt disallowing /explorer/
All checks were successful
Build Container (Nix) / detect (push) Successful in 3s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (quartz) (push) Successful in 2s
Build Container / build (quartz) (push) Successful in 9s
The /explorer/ SPA endpoints were the source of all spider-trap traffic.
A robots.txt Disallow is a better fix than serving 404s — it prevents
crawlers from entering the infinite URL tree in the first place, avoids
serving large numbers of 404s that hurt SEO, and doesn't break legitimate
deep links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:34:37 -08:00
Forgejo Actions
8b0ff3d7a5 Update docs release to v1.13.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 10:00:42 -08:00
6636576cdc Add spider-trap guards to docs.eblu.me Quartz nginx config
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 12s
Block recursive crawler paths caused by SPA fallback + relative links:
/tags/ depth >1 returns 404, global depth ≥5 returns 404.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:43:41 -08:00
6e8d11c6bb Add :kustomized sentinel tag to manifest images, review devpi
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 08:15:06 -08:00
6d84fcfb05 Review how-to index: strip prose, add last-reviewed
Removed descriptions, table formatting, and Mikado chain commentary
from the how-to index — it should be links only. Added last-reviewed
date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 07:52:06 -08:00
Forgejo Actions
d98ef984ea Update docs release to v1.13.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-05 11:11:38 -08:00
448689bf2a Bump runner-job-image Dagger CLI from 0.19.11 to 0.20.0
Some checks failed
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 1s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Failing after 2s
The Dagger module was upgraded to v0.20.0 in d15071a but the runner job
image still had the old CLI, causing build-blumeops to fail with a
version mismatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:58:38 -08:00
c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
d15071aaf9 Upgrade Dagger from v0.19.11 to v0.20.0 (#285)
## Summary
- Bump Dagger engine version from v0.19.11 to v0.20.0 in `dagger.json`
- Pin dagger CLI to `0.20.0` in `mise.toml` (was `"latest"`)
- Regenerated `.dagger/uv.lock` (new SDK deps: httpcore, beartype bump)

## Testing
- [x] `dagger call validate-workflows --src=.` passes on v0.20.0
- [ ] CI build workflow passes

Reviewed-on: #285
2026-03-05 09:32:13 -08:00
405fc59c12 Add Authentik OIDC login for ArgoCD (#284)
## Summary
- Add Authentik OAuth2 provider + application blueprint for ArgoCD (ringtail side)
- Add OIDC config to ArgoCD ConfigMap with Authentik as identity provider (indri side)
- Map Authentik `admins` group to ArgoCD `role:admin` via RBAC policy
- ExternalSecrets on both sides pull `argocd-client-secret` from 1Password
- Local admin password remains as break-glass — both login methods coexist

## Pre-deployment manual step
Add `argocd-client-secret` field to "Authentik (blumeops)" in 1Password with a random value (e.g., `openssl rand -hex 32`).

## Deployment order
1. Sync Authentik app on ringtail first (blueprint + secret + worker env var)
2. Sync ArgoCD app on indri second (cm, rbac, ExternalSecret)

## Verification
- [ ] `argocd-client-secret` field added to 1Password
- [ ] Authentik app synced on ringtail — blueprint applied, provider created
- [ ] ArgoCD app synced on indri — OIDC config applied
- [ ] SSO login works: visit `https://argocd.ops.eblu.me` → "Log in via Authentik" → admin access
- [ ] Break-glass: local admin/password login still works

Reviewed-on: #284
2026-03-05 09:07:25 -08:00
c029e5851a Review migrate-forgejo-from-brew doc, fix stale Phase 3 reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:29:58 -08:00
f6f0f79a5b Bump kiwix-serve from 3.8.1 to 3.8.2
All checks were successful
Build Container (Nix) / detect (push) Successful in 4s
Build Container (Nix) / build (kiwix-serve) (push) Successful in 3s
Build Container / detect (push) Successful in 1m57s
Build Container / build (kiwix-serve) (push) Successful in 1m15s
Minor upstream release with doc and CI fixes. Also corrects kiwix.md
to reference the actual custom registry image and torrents.txt path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:12:32 -08:00
797133b28e Fix per-torrent rate panels showing cumulative bytes instead of rates
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 38s
Dashboard "Download/Upload Rate by Torrent" panels were querying
transmission_torrent_download_bytes (total_size * percent_done) and
transmission_torrent_upload_bytes (uploaded_ever) — cumulative byte
gauges, not rates. Added new metrics using Transmission's native
rate_download/rate_upload and updated dashboard queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:01:37 -08:00
f2704b26da Replace transmission-exporter with homegrown Python exporter (#283)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 19s
## Summary
- Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter
- Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels)
- Same metric names so existing Grafana Transmission dashboard works unchanged
- Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern

## Changes
- **New:** `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines)
- **New:** `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv
- **Modified:** `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference
- **Modified:** `argocd/manifests/torrent/kustomization.yaml` — add image tag entry
- **Modified:** `service-versions.yaml` — add transmission-exporter entry

## Deployment and Testing
- [ ] Build container: `mise run container-build-and-release transmission-exporter`
- [ ] Update kustomization.yaml newTag with build SHA
- [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent`
- [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics | grep transmission_`
- [ ] Verify Grafana Transmission dashboard panels populate
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

Reviewed-on: #283
2026-03-04 21:55:00 -08:00
008da43736 Add OOMKill observability to Kubernetes Clusters dashboard
OOMKilled containers previously only appeared briefly in "Unhealthy Pods"
while dying, then vanished on restart. New panels use persistent metrics
(last_terminated_reason) and restart rate tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:53:07 -08:00
77a1ea15d2 Remove mikado frontmatter from closed chains, clarify finalization rules
During finalization, all mikado frontmatter (requires, status, branch) should
be removed — cards become plain documentation linked via wiki-links. Updated
agent-change-process docs and cleaned up 10 cards from closed chains. Also
fixed ai-docs referencing deleted plans/ files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:43:19 -08:00
55a846eb25 Retire plans directory, convert migrate-forgejo-from-brew to mikado card
The plans/ directory predated the mikado method approach. Deleted all
completed and abandoned plans, converted the still-relevant
migrate-forgejo-from-brew into a lean mikado chain root card under
how-to/forgejo/, cleaned up dangling wiki-links across docs, and
fixed a stale "pre-commit" reference to "prek".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:28:14 -08:00
6ca3c67705 Add Ollama reference card and update indexes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 19:43:14 -08:00
5ddb47de1c Review upgrade-grafana doc: fix image tag ref, add sidecar link
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 07:53:22 -08:00
b460333da0 Upgrade Transmission to 4.1.1 (#282)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (transmission) (push) Successful in 2s
Build Container / build (transmission) (push) Successful in 6s
## Summary
- Upgrade Transmission from 4.0.6-r4 to 4.1.1-r1
- Uses Alpine edge community repo for transmission packages, keeping stable alpine:3.22 base
- Fix stale image reference in service doc (was linuxserver, now custom registry image)
- Mark transmission as reviewed in service-versions.yaml

## Context
Service review found Transmission two minor versions behind (4.0.6 → 4.1.1). Alpine 3.22 only packages 4.0.6, so transmission is installed from edge's community repo with an exact version pin.

4.1.0 added improved µTP performance, IPv6/dual-stack UDP tracker, JSON-RPC 2.0 API. 4.1.1 is a bugfix release (20+ fixes).

Dagger test build passed locally.

## Deployment and Testing
- [ ] Build container via Forgejo workflow (`mise run container-build-and-release transmission`)
- [ ] Update kustomization.yaml with new image tag
- [ ] `argocd app set torrent --revision feature/transmission-review && argocd app sync torrent`
- [ ] Verify web UI at https://torrent.ops.eblu.me
- [ ] Check Grafana Transmission dashboard still receives metrics
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

## Note
The transmission-exporter sidecar (OOMKilling every ~30min, 294 restarts) is being tracked separately as a future replacement project.

Reviewed-on: #282
2026-03-04 07:44:33 -08:00
b3d5478020 Use towncrier orphan fragment naming for C0 changes
C0 changes have no branch name, so `main.<type>.md` fragments collide.
Switch to towncrier's `+<slug>.<type>.md` orphan convention and rename
existing `main.*` fragments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:30:00 -08:00
d7f0aa6f96 Fix Frigate database path to use persistent volume
The database was at /config/frigate.db (emptyDir, ephemeral) instead of
/db/frigate.db (PVC, persistent). Every pod restart wiped the database,
losing all recording history and leaving orphaned files on NFS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:18:16 -08:00
a4f5f7ce09 Add changelog fragment for Frigate memory limit bump
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:58:35 -08:00
a2bb9abbdb Home-build grafana-sidecar container (#281)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s
Build Container / build (grafana-sidecar) (push) Successful in 6s
## Summary
- Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0`
- Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop
- Adds Dockerfile, service-versions entry, docs, and changelog fragment
- Manifest switch to home-built image pending container build

## Deployment and Testing
- [ ] `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization.yaml with built image tag
- [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana`
- [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me
- [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana`

Reviewed-on: #281
2026-03-03 13:48:24 -08:00
81a8ca24b9 Clarify that changelog fragments apply to all change levels (C0–C2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:15:06 -08:00
876e51dd77 Allow implicit octals in yamllint and normalize k8s mode values
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:10:44 -08:00
4518fa3ac3 Add changelog fragment for Gandi bookmark
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:06:02 -08:00
3dc4ed730b Build Loki container image locally (#280)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (loki) (push) Successful in 2s
Build Container / build (loki) (push) Successful in 7s
## Summary
- Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/`
- Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki`
- Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build

## Details
- UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup)
- CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build`
- Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session)
- Pattern follows miniflux (two-stage Go) + prometheus (ldflags)

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release loki`
- [ ] Update kustomize tag to actual build tag
- [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki`
- [ ] Verify `/ready` endpoint and log ingestion
- [ ] After merge: update to `[main]` tag (C0 follow-up)

Reviewed-on: #280
2026-03-03 13:00:43 -08:00
eb9bc57351 Upgrade TeslaMate v2.2.0 → v3.0.0 (#279)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (teslamate) (push) Successful in 2s
Build Container / detect (push) Successful in 26s
Build Container / build (teslamate) (push) Successful in 4m22s
## Summary
- Upgrade TeslaMate from v2.2.0 to v3.0.0 (first service review)
- Elixir 1.18 → 1.19.5, runtime base bookworm → trixie
- Adds zstd/brotli build deps for new static asset compression
- DB migration (BTREE → BRIN indexes) runs automatically via entrypoint

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release teslamate`
- [ ] Update kustomization.yaml with new image tag
- [ ] Deploy from branch: `argocd app set teslamate --revision upgrade/teslamate-v3.0.0 && argocd app sync teslamate`
- [ ] Verify TeslaMate UI loads and data is intact
- [ ] Check logs for migration errors
- [ ] After merge: reset ArgoCD to main, update kustomization tag to `[main]` image

Reviewed-on: #279
2026-03-03 11:56:40 -08:00
823a35bb9a Add changelog fragment for ringtail firewall fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:17:48 -08:00
6c5a99883f Add pre-commit check for changelog fragment placement
Misfiled fragment from feature/ branch created a subdirectory under
changelog.d/ which towncrier doesn't support. Move the fragment to the
correct flat location and add a changelog-check mise task + prek hook
to prevent this from happening again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:49:01 -08:00
7b68be2e80 Add fly.io proxy observability and app logs to Forgejo dashboard
Rename "Forgejo Repository Health" to "Forgejo" and add proxy metrics
(request rate, error rate, RPS, latency, bandwidth), proxy access logs,
and Forgejo application logs from Loki.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:24:53 -08:00
cf8736c73b Review kustomize-grafana-deployment: fix manifest table to match reality
The doc listed a nonexistent configmap.yaml instead of the actual raw
config files (grafana.ini, datasources.yaml, provider.yaml) consumed
by kustomization.yaml's configMapGenerator. Added last-reviewed date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:14:41 -08:00
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
31d925814f Deploy Ollama LLM server on ringtail (#277)
## Summary
- Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration
- Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern)
- Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b`
- hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi)
- Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet
- Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080

## Deployment and Testing
- [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin`
- [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2`
- [ ] Sync `apps` app with `--revision feature/ollama-ringtail`
- [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama`
- [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail`
- [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags`
- [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'`
- [ ] Verify Frigate still works after GPU sharing change
- [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277
2026-03-02 20:39:51 -08:00
Forgejo Actions
0f79c61c42 Update docs release to v1.12.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-02 18:17:07 -08:00
7a1875936c Switch git hooks from pre-commit to prek (#276)
## Summary

- Replace pre-commit with [prek](https://github.com/j178/prek), a faster Rust-native drop-in alternative
- Migrate config from `.pre-commit-config.yaml` (YAML) to `prek.toml` (TOML)
- Add new built-in checks: case conflicts, private key detection, executable shebangs
- Install prek via mise native registry (`aqua:j178/prek`) instead of pipx
- Update all doc references across README, contributing guide, and how-to docs

## Notes

- `check-yaml` still uses the remote `pre-commit-hooks` repo because prek's builtin fast path doesn't support `--unsafe` yet (needed for Ansible custom YAML tags)
- All existing custom hooks (docs validation, container version check, mikado invariant, workflow validation) work unchanged
- Tested: all hooks pass on clean tree, deliberate doc link breakage is caught

## Test plan

- [x] `prek run --all-files` passes all checks
- [x] Broken wiki-link correctly caught by `docs-check-links`
- [x] taplo-format auto-fixes TOML formatting on commit
- [x] commit-msg hook (mikado invariant) fires correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/276
2026-03-02 18:15:23 -08:00
2d54f93c68 Add changelog for impl-card-guard feature
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:45:01 -08:00
9465b75815 Add changelog for authentik source chain doc review
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:28:09 -08:00
08b9570ac7 Review build-authentik-from-source Mikado chain docs
Fix go-server-derivation: wrong path target (webui not authentik-django)
and missing internal/web/static.go patch. Remove stale DRF fork content
from mirror-build-deps (no longer needed as of 2026.2.0). Add
last-reviewed to all 5 cards without it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:28:09 -08:00
Forgejo Actions
847e47eaf3 Update docs release to v1.12.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-01 17:24:09 -08:00
2a2811d7a5 Review authentik-api-client-generation doc: fix stale content
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:21:46 -08:00
c9d273dc81 Update authentik changelog fragment to mention version upgrade
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:11:41 -08:00
2d4098e480 Fix authentik 2026.2.0 migration ordering bug (#275)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 3m6s
## Summary

- Patch `authentik_rbac/0010` migration to depend on `authentik_core/0056`, fixing non-deterministic ordering that crashes startup with `FieldError: Cannot resolve keyword 'group_id'`
- Upstream bug: goauthentik/authentik#19616, #20634 — no fix released yet
- Document the issue in the lessons-learned table

## Deployment and Testing

- [ ] CI builds container image
- [ ] Deploy from branch: `argocd app set authentik --revision fix/authentik-migration-ordering && argocd app sync authentik`
- [ ] Pods reach Running/Ready without crash-looping
- [ ] `kubectl logs` show 0056 migrating before 0010
- [ ] authentik UI loads at authentik.ops.eblu.me
- [ ] `mise run services-check`
- [ ] After merge: `argocd app set authentik --revision main && argocd app sync authentik`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/275
2026-03-01 16:28:36 -08:00