Commit graph

568 commits

Author SHA1 Message Date
4518fa3ac3 Add changelog fragment for Gandi bookmark
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:06:02 -08:00
eceea2126b Add Gandi bookmark to homepage dashboard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:05:50 -08:00
51626e6630 Update Loki to v3.6.5-3dc4ed7 container image
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:01:49 -08:00
3dc4ed730b Build Loki container image locally (#280)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (loki) (push) Successful in 2s
Build Container / build (loki) (push) Successful in 7s
## Summary
- Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/`
- Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki`
- Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build

## Details
- UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup)
- CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build`
- Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session)
- Pattern follows miniflux (two-stage Go) + prometheus (ldflags)

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release loki`
- [ ] Update kustomize tag to actual build tag
- [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki`
- [ ] Verify `/ready` endpoint and log ingestion
- [ ] After merge: update to `[main]` tag (C0 follow-up)

Reviewed-on: #280
2026-03-03 13:00:43 -08:00
f914a14653 Update teslamate to v3.0.0-eb9bc57 container image
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 12:02:26 -08:00
eb9bc57351 Upgrade TeslaMate v2.2.0 → v3.0.0 (#279)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (teslamate) (push) Successful in 2s
Build Container / detect (push) Successful in 26s
Build Container / build (teslamate) (push) Successful in 4m22s
## Summary
- Upgrade TeslaMate from v2.2.0 to v3.0.0 (first service review)
- Elixir 1.18 → 1.19.5, runtime base bookworm → trixie
- Adds zstd/brotli build deps for new static asset compression
- DB migration (BTREE → BRIN indexes) runs automatically via entrypoint

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release teslamate`
- [ ] Update kustomization.yaml with new image tag
- [ ] Deploy from branch: `argocd app set teslamate --revision upgrade/teslamate-v3.0.0 && argocd app sync teslamate`
- [ ] Verify TeslaMate UI loads and data is intact
- [ ] Check logs for migration errors
- [ ] After merge: reset ArgoCD to main, update kustomization tag to `[main]` image

Reviewed-on: #279
2026-03-03 11:56:40 -08:00
f91b8d0d98 Fix mirror-update-pats corrupting all GitHub mirror URLs
The bash parameter expansion `${var/pat/rep}` treats `\/` in the
replacement as a literal backslash-slash, not an escaped delimiter.
This produced URLs like `https:\/\/eblume:...` instead of
`https://eblume:...`, breaking Forgejo's URL parser (500 on mirror
settings pages) and preventing mirror syncs.

Use prefix stripping (`${var#prefix}`) instead.

All 22 corrupted mirrors have been repaired on indri.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:46:41 -08:00
823a35bb9a Add changelog fragment for ringtail firewall fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:17:48 -08:00
c5d82b0942 Trust k3s CNI interfaces in ringtail NixOS firewall
The NixOS firewall was blocking pod-to-host TCP traffic because only
tailscale0 was trusted. Pods could ping the host but not reach the
API server (port 6443), breaking Tailscale Ingress TLS cert refresh
and all ringtail services (authentik, frigate, ntfy, ollama).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:15:02 -08:00
6c5a99883f Add pre-commit check for changelog fragment placement
Misfiled fragment from feature/ branch created a subdirectory under
changelog.d/ which towncrier doesn't support. Move the fragment to the
correct flat location and add a changelog-check mise task + prek hook
to prevent this from happening again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:49:01 -08:00
01d3b4d1c7 Switch forgejo-runner ArgoCD app to internal SSH repo URL
Was the only app still using https://forge.eblu.me (public proxy) for
git polling. All other apps already use the internal SSH endpoint at
forge.ops.eblu.me.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:43:01 -08:00
82884436df Route runner polling through internal forge.ops.eblu.me
The k8s and ringtail runners were hitting forge.eblu.me (fly.io proxy)
for every FetchTask poll (~every 2s), round-tripping through the public
internet unnecessarily. Use forge.ops.eblu.me (Caddy on indri, tailnet)
for infrastructure workloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:33:40 -08:00
7b68be2e80 Add fly.io proxy observability and app logs to Forgejo dashboard
Rename "Forgejo Repository Health" to "Forgejo" and add proxy metrics
(request rate, error rate, RPS, latency, bandwidth), proxy access logs,
and Forgejo application logs from Loki.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:24:53 -08:00
cf8736c73b Review kustomize-grafana-deployment: fix manifest table to match reality
The doc listed a nonexistent configmap.yaml instead of the actual raw
config files (grafana.ini, datasources.yaml, provider.yaml) consumed
by kustomization.yaml's configMapGenerator. Added last-reviewed date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:14:41 -08:00
86a0dee000 Remove ollama LAN NodePort service
The sanctioned ingress is ollama.ops.eblu.me via tailnet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:00:05 -08:00
3af346f1cd Move ollama LAN NodePort to port 80
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:37:50 -08:00
2d5b78a4d7 Add Forge (public) to services-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 08:48:15 -08:00
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
a32c99a252 Limit ollama to one loaded model and one parallel request
Prevents OOM when switching between models — only one 14B model
fits in 16GB VRAM at a time with KV cache for context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:23:12 -08:00
203e3cd567 Add NodePort service for ollama LAN access
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:57:18 -08:00
31d925814f Deploy Ollama LLM server on ringtail (#277)
## Summary
- Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration
- Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern)
- Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b`
- hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi)
- Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet
- Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080

## Deployment and Testing
- [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin`
- [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2`
- [ ] Sync `apps` app with `--revision feature/ollama-ringtail`
- [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama`
- [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail`
- [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags`
- [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'`
- [ ] Verify Frigate still works after GPU sharing change
- [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277
2026-03-02 20:39:51 -08:00
Forgejo Actions
0f79c61c42 Update docs release to v1.12.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-02 18:17:07 -08:00
7a1875936c Switch git hooks from pre-commit to prek (#276) v1.12.1
## Summary

- Replace pre-commit with [prek](https://github.com/j178/prek), a faster Rust-native drop-in alternative
- Migrate config from `.pre-commit-config.yaml` (YAML) to `prek.toml` (TOML)
- Add new built-in checks: case conflicts, private key detection, executable shebangs
- Install prek via mise native registry (`aqua:j178/prek`) instead of pipx
- Update all doc references across README, contributing guide, and how-to docs

## Notes

- `check-yaml` still uses the remote `pre-commit-hooks` repo because prek's builtin fast path doesn't support `--unsafe` yet (needed for Ansible custom YAML tags)
- All existing custom hooks (docs validation, container version check, mikado invariant, workflow validation) work unchanged
- Tested: all hooks pass on clean tree, deliberate doc link breakage is caught

## Test plan

- [x] `prek run --all-files` passes all checks
- [x] Broken wiki-link correctly caught by `docs-check-links`
- [x] taplo-format auto-fixes TOML formatting on commit
- [x] commit-msg hook (mikado invariant) fires correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/276
2026-03-02 18:15:23 -08:00
940219338a Slightly less confusing output when mikado cards share a prereq 2026-03-02 17:58:18 -08:00
2d54f93c68 Add changelog for impl-card-guard feature
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:45:01 -08:00
6131f881a8 Enforce impl commits can't modify Mikado card files
The mikado-branch-invariant-check hook now inspects staged files (commit-msg
hook) and historical commit files (standalone mode) to reject impl commits
that touch markdown files with Mikado frontmatter (requires:, status:, or
branch: mikado/). Cards should only be modified by plan, close, or finalize.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:44:34 -08:00
10d0636cee Review navidrome and miniflux: both at latest versions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:32:17 -08:00
9465b75815 Add changelog for authentik source chain doc review
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:28:09 -08:00
08b9570ac7 Review build-authentik-from-source Mikado chain docs
Fix go-server-derivation: wrong path target (webui not authentik-django)
and missing internal/web/static.go patch. Remove stale DRF fork content
from mirror-build-deps (no longer needed as of 2026.2.0). Add
last-reviewed to all 5 cards without it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:28:09 -08:00
Forgejo Actions
847e47eaf3 Update docs release to v1.12.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-01 17:24:09 -08:00
2a2811d7a5 Review authentik-api-client-generation doc: fix stale content v1.12.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:21:46 -08:00
c9d273dc81 Update authentik changelog fragment to mention version upgrade
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:11:41 -08:00
503775085d Deploy authentik 2026.2.0 with migration ordering fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:32:10 -08:00
2d4098e480 Fix authentik 2026.2.0 migration ordering bug (#275)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 3m6s
## Summary

- Patch `authentik_rbac/0010` migration to depend on `authentik_core/0056`, fixing non-deterministic ordering that crashes startup with `FieldError: Cannot resolve keyword 'group_id'`
- Upstream bug: goauthentik/authentik#19616, #20634 — no fix released yet
- Document the issue in the lessons-learned table

## Deployment and Testing

- [ ] CI builds container image
- [ ] Deploy from branch: `argocd app set authentik --revision fix/authentik-migration-ordering && argocd app sync authentik`
- [ ] Pods reach Running/Ready without crash-looping
- [ ] `kubectl logs` show 0056 migrating before 0010
- [ ] authentik UI loads at authentik.ops.eblu.me
- [ ] `mise run services-check`
- [ ] After merge: `argocd app set authentik --revision main && argocd app sync authentik`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/275
2026-03-01 16:28:36 -08:00
90621e4155 Deploy authentik 2026.2.0 with entry_points fix
Update image tag to v2026.2.0-78027eb-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:04:29 -08:00
78027eb783 Fix authentik: entry_points API for Python 3.14
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 3m7s
Python 3.14's EntryPoints uses string keys, not integer indices.
eps[0] raises KeyError(0); use next(iter(eps)) instead.
Verified on ringtail with the actual venv python.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:02:49 -08:00
e2c650b027 Deploy authentik 2026.2.0 with BASE_DIR fix
Update image tag to v2026.2.0-e49d966-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:55:50 -08:00
e49d966229 Fix authentik: symlink authentik/ for BASE_DIR resolution
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 3m4s
Django's BASE_DIR is $out but source lives in site-packages. Code
like BASE_DIR / "authentik" / "sources" / "scim" / "schemas" / ...
needs a top-level symlink to find data files alongside Python source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:55:26 -08:00
c0e29476f3 Deploy authentik 2026.2.0 with TMPDIR fix
Update image tag to v2026.2.0-b7bfb0b-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:53:09 -08:00
b7bfb0bfae Fix authentik container: set TMPDIR=/tmp
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 45s
lifecycle/ak uses ${TMPDIR}/authentik-mode — without TMPDIR set it
tries to write /authentik-mode in root, which user 65534 can't do.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:52:36 -08:00
38da372f94 Deploy authentik 2026.2.0 with /tmp fix
Update image tag to v2026.2.0-2ac353b-nix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:51:17 -08:00
2ac353b7bf Fix authentik container: create /tmp for unprivileged user
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 54s
buildLayeredImage doesn't create /tmp by default. The container runs
as user 65534 (nobody) which can't mkdir /tmp at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:48:05 -08:00
098f3e517c Deploy authentik 2026.2.0 (source-built) to ArgoCD
Update image tag to v2026.2.0-efa9806-nix — the first source-built
authentik container from the build-authentik-from-source chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:44:35 -08:00
efa9806bfa C2: Build authentik from source (Mikado chain) (#274)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 22s
## Mikado Chain: build-authentik-from-source

Replace `pkgs.authentik` from nixpkgs with a custom Nix derivation built from source.
This removes the dependency on the nixpkgs packaging timeline and gives full version control.

Target version: **2025.12.4** (nixpkgs reference, upgrading from deployed 2025.10.1).

### Dependency Graph

```
build-authentik-from-source (goal)
├── authentik-go-server-derivation
│   ├── authentik-api-client-generation  ← IN PROGRESS
│   └── authentik-python-backend-derivation
├── authentik-web-ui-derivation
│   └── authentik-api-client-generation  ← IN PROGRESS
└── authentik-python-backend-derivation
```

### Ready Leaves
- `authentik-api-client-generation` — Go + TypeScript client generation from OpenAPI schema
- `authentik-python-backend-derivation` — Django backend with 60+ deps, 4 in-tree packages

### Architecture
Ported from [nixpkgs `pkgs/by-name/au/authentik/package.nix`](https://github.com/NixOS/nixpkgs/tree/master/pkgs/by-name/au/authentik):
- `source.nix` — shared version/source fetch
- `client-go.nix` — Go API client generation
- `client-ts.nix` — TypeScript API client generation
- `api-go-vendor-hook.nix` — Go vendor directory injection hook
- (more components to follow as leaves are closed)

### Related Cards
- [[build-authentik-from-source]] — Goal card
- [[authentik-api-client-generation]]
- [[authentik-python-backend-derivation]]
- [[authentik-web-ui-derivation]]
- [[authentik-go-server-derivation]]

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/274
2026-03-01 13:45:00 -08:00
0aaf9bb8b2 Add Dagger local build step to authentik source build goal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:39:25 -08:00
7094ea7d3e Start C2 Mikado chain: build authentik from source
Create goal card and 4 prerequisite cards for building authentik from a
custom Nix derivation instead of using pkgs.authentik from nixpkgs. This
removes the dependency on the nixpkgs packaging timeline and gives full
version control over authentik releases.

Chain: mikado/authentik-source-build
Leaf nodes: authentik-api-client-generation, authentik-python-backend-derivation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:20:17 -08:00
922265c88f Add changelog fragment for grafana doc review
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 07:28:47 -08:00
8d1e98617b Review build-grafana-container docs: stamp reviewed, fix cross-links
Also fix stale grafana.md reference card (Helm → Kustomize).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 07:28:06 -08:00
02eb169403 Pin blumeops-pg to PostgreSQL 18.3
Replace floating :18 tag with pinned :18.3 (upstream out-of-cycle
release fixing 18.2 regressions). Stamps service as reviewed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 16:25:32 -08:00
2312e5fbf8 Update ringtail flake inputs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 15:22:29 -08:00