Commit graph

84 commits

Author SHA1 Message Date
4680e4d4ff Fix Pyroscope config flag: -config.file not -config.path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:15:46 -07:00
a269bd6526 Fix UI installPhase: use relative path from cwd after cd in buildPhase
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:04:17 -07:00
78a89d7ed1 Fix webpack CopyPlugin: rewrite icons path instead of deleting entry
Previous sed deleted lines which broke the webpack config syntax.
Instead, rewrite the `from:` path to point at the pre-copied icons
directory which webpack can access without following Nix store symlinks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:56:55 -07:00
10398c715c Fix icons path: use root node_modules in mkYarnPackage layout
mkYarnPackage hoists most dependencies to the root node_modules/,
with deps/grafana-pyroscope/node_modules/ containing only package-
specific overrides. @grafana/ui lives at ../../node_modules/, not
the local node_modules/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:55:40 -07:00
f727a352c3 Fix icons copy: use directory copy instead of glob expansion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:50:37 -07:00
98bff9e3eb Fix icons glob: pre-copy icons and patch webpack CopyPlugin
mkYarnPackage symlinks node_modules into the read-only Nix store,
so we can't modify the directory in-place. Instead, pre-copy the
@grafana/ui icons to the webpack output location and sed-remove the
CopyPlugin entry that tries to glob through the symlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:45:39 -07:00
1327972183 Restore embedded frontend with symlink workaround for icons glob
mkYarnPackage symlinks node_modules from the Nix store, but webpack's
CopyPlugin can't follow symlinks when resolving glob patterns. The
@grafana/ui icons directory is dereferenced (cp -rL) before running
the webpack build. Also fixes binary output path (root, not cmd/).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:43:11 -07:00
ef2385322a Drop embedded frontend from Pyroscope build
The webpack build fails on a @grafana/ui icons glob that doesn't
resolve in the Nix sandbox. Since Grafana is the primary UI via the
pyroscope datasource plugin, the standalone web UI isn't needed.
Build with EMBEDASSETS="" (upstream's build-dev path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:39:34 -07:00
602a89812b Fix Pyroscope build: add frontend via mkYarnPackage, use go/bin target
The upstream Makefile's `build` target runs frontend via Docker, which
isn't available in the Nix sandbox. Instead, build the frontend
(yarn + webpack) separately via mkYarnPackage, copy assets to
public/build/, then invoke `make go/bin` with EMBEDASSETS=embedassets
to compile the Go binary with embedded frontend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:31:40 -07:00
8a766a907d Set real goModules hash for Pyroscope Nix build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:29:09 -07:00
268645f561 Fix goModules derivation: disable fixup to avoid store path references
The Go toolchain auto-download (triggered by `toolchain go1.25.8` in
go.mod) puts scripts with shebangs into the module cache. Nix fixup
phase patches these to reference /nix/store paths, violating the
fixed-output derivation contract. dontFixup = true leaves the cache
as-is, which is fine since these files are only consumed as build
inputs by the pyroscope derivation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:25:33 -07:00
01317634ad Add Nix container build for Pyroscope and update to v1.19.1
Nix derivation follows the Alloy pattern: stdenv + pre-fetched Go
modules for multi-module workspace (go.work with ./api, ./lidia).
goModules hash is a placeholder (fakeHash) — first build on ringtail
will produce the real hash. Kustomization updated to use local
registry image. Service-versions entries added for pyroscope and
alloy-profiling-ringtail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:20:05 -07:00
e375859221 Upgrade Homepage container to v1.11.0
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dockerfile (homepage) (push) Successful in 5m47s
Minor release with new widgets (Tracearr, SparklyFitness), Seerr rename,
and dependency bumps. No breaking changes for our config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:17:36 -07:00
696024306c Add Prowler image vulnerability scanning for blumeops containers
All checks were successful
Build Container / detect (push) Successful in 39s
Build Container / build-dockerfile (prowler) (push) Successful in 10m15s
Add Trivy to the Prowler container for image and IaC scanning.
New CronJob (Saturday 3am) scans all blumeops/* images in the
registry for CVEs, embedded secrets, and Dockerfile misconfigs.
Reports written to sifaka:/volume1/reports/prowler-images/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:43:08 -07:00
d021b3534f Deploy Prowler CIS scanner (#310)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dockerfile (prowler) (push) Successful in 10s
## Summary
- Deploy Prowler 5 as a weekly CronJob on minikube-indri for CIS Kubernetes Benchmark v1.11 scanning
- Custom slim container build (strips PowerShell, Trivy, and non-K8s providers from upstream)
- Reports (HTML, CSV, JSON-OCSF) written to NFS share on sifaka at `/volume1/reports/prowler/`
- Read-only ClusterRole for pod, RBAC, and control plane inspection
- Host path mounts + hostPID for kubelet file permission checks

## Follow-ups
- Mirror prowler-cloud/prowler on forge for supply chain control
- Build and push container image, update kustomization.yaml newTag
- Consider adding k3s-ringtail scanning (core + RBAC checks only)

## Test plan
- [ ] Build container: `mise run container-release prowler v5.22.0`
- [ ] Update `argocd/manifests/prowler/kustomization.yaml` newTag to built image tag
- [ ] Sync ArgoCD: `argocd app sync apps && argocd app set prowler --revision deploy-prowler && argocd app sync prowler`
- [ ] Trigger manual job: `kubectl create job --from=cronjob/prowler prowler-manual -n prowler --context=minikube-indri`
- [ ] Verify reports appear on sifaka NFS share
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #310
2026-03-24 16:08:09 -07:00
fd0bebb0fc Localize authentik-redis container (#309)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dockerfile (alloy) (push) Successful in 12s
Build Container / build-dockerfile (ntfy) (push) Successful in 11s
Build Container / build-nix (alloy) (push) Successful in 20s
Build Container / build-nix (authentik) (push) Successful in 6m10s
Build Container / build-nix (authentik-redis) (push) Successful in 20s
Build Container / build-nix (ntfy) (push) Successful in 6s
## Summary

- Replace upstream `docker.io/library/redis:7-alpine` (Redis 7.4.8) with a nix-built container using Redis 8.2.3 from nixpkgs
- Introduce **attached service pattern**: `parent` field in service-versions.yaml, `<parent>-<component>` naming convention, and `assert pkgs.redis.version == version` in default.nix to prevent silent version drift on `flake.lock` updates
- Document the pattern in [[review-services]] so future attached services slot in cleanly
- Backfill `parent: grafana` on existing `grafana-sidecar` entry

## Version drift protection

1. `flake.lock` update bumps nixpkgs redis → `assert` in `default.nix` breaks `nix-build`
2. Developer updates `version` in `default.nix` → prek's `container-version-check` demands matching `service-versions.yaml` update
3. Both must agree before commit succeeds

## Test plan

- [ ] Build container from branch on ringtail (`mise run container-build-and-release authentik-redis`)
- [ ] Update kustomization `newTag` to branch-built image tag
- [ ] Sync authentik ArgoCD app from branch (`argocd app set authentik --revision localize-redis && argocd app sync authentik`)
- [ ] Verify Authentik login, session persistence, and task queue still work
- [ ] After merge: C0 follow-up to update `newTag` to the main-built image tag

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #309
2026-03-24 13:27:36 -07:00
fc45989a6c Decommission JobSync service (#308)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary

- Remove all JobSync infrastructure: ArgoCD app, k8s manifests, container build (nix), Caddy reverse proxy entry, Homepage dashboard entry, service-versions tracking, and all documentation
- Runtime teardown already completed: ArgoCD app cascade-deleted (removes deployment, PVC, service, ingress, external-secret), forge mirror deleted, 1Password item archived, local clone removed

## Motivation

Replacing JobSync with a datasette-based job tracking pipeline driven by mise tasks and a Claude agent frontend. JobSync's Next.js server actions don't expose a useful API for automation.

## Remaining manual steps after merge

- Provision Caddy to remove the stale proxy route: `mise run provision-indri -- --tags caddy`
- Sync Homepage: `argocd app sync homepage`
- Verify namespace cleanup on ringtail: `kubectl get ns jobsync --context=k3s-ringtail` (should be gone)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #308
2026-03-24 08:44:23 -07:00
bd0ff30d3f Unify container build workflows (#306)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary
- Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow
- Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix`
- Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo
- Containers with both build files (alloy, nettest, ntfy) get built on both runners

## Test plan
- [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only
- [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only
- [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners
- [ ] Test workflow_dispatch with a specific container name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #306
2026-03-23 20:55:50 -07:00
d1dac0c241 Upgrade ntfy v2.17.0 → v2.19.2 (#305)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (ntfy) (push) Successful in 4s
Build Container / build (ntfy) (push) Successful in 11s
## Summary
- Upgrade ntfy from v2.17.0 to v2.19.2
- Update Dockerfile and Nix build definitions with new version, commit SHA, and hashes
- Add `subPackages = [ "." ]` to Nix build to handle new `tools/loadtest` module in upstream

## Upstream changes (no breaking changes)
- **v2.18.0:** Experimental PostgreSQL backend support
- **v2.19.0:** PostgreSQL read replica support, notification sound throttling
- **v2.19.1-2:** PostgreSQL bug fixes, web push race condition fix

## Test plan
- [ ] Container builds complete on Forgejo Actions (both Dockerfile and Nix)
- [ ] Update kustomization.yaml `newTag` to the built nix image tag
- [ ] `argocd app set ntfy --revision upgrade/ntfy-v2.19.2 && argocd app sync ntfy`
- [ ] Verify ntfy health: `curl https://ntfy.ops.eblu.me/v1/health`
- [ ] Send a test notification

Reviewed-on: #305
2026-03-23 10:32:06 -07:00
f9426b734c Update loki to 3.6.7 (#302)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (loki) (push) Successful in 1s
Build Container / build (loki) (push) Successful in 6s
Reviewed-on: #302
2026-03-20 16:02:28 -07:00
613f05dfde Add consistent OCI labels to all container Dockerfiles
All checks were successful
Build Container (Nix) / build (miniflux) (push) Successful in 2s
Build Container (Nix) / build (navidrome) (push) Successful in 2s
Build Container / build (devpi) (push) Successful in 41s
Build Container (Nix) / build (nettest) (push) Successful in 15s
Build Container / build (grafana-sidecar) (push) Successful in 1m27s
Build Container / build (grafana) (push) Successful in 3m23s
Build Container (Nix) / build (ntfy) (push) Successful in 3m19s
Build Container (Nix) / build (prometheus) (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container (Nix) / build (runner-job-image) (push) Successful in 1s
Build Container (Nix) / build (teslamate) (push) Successful in 2s
Build Container (Nix) / build (transmission) (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 1s
Build Container (Nix) / build (unpoller) (push) Successful in 1s
Build Container / build (kiwix-serve) (push) Successful in 1m17s
Build Container / build (kubectl) (push) Successful in 41s
Build Container / build (homepage) (push) Successful in 8m21s
Build Container / build (mealie) (push) Successful in 1m1s
Build Container / build (loki) (push) Successful in 8m21s
Build Container / build (miniflux) (push) Successful in 2m24s
Build Container / build (nettest) (push) Successful in 14s
Build Container / build (ntfy) (push) Successful in 8m33s
Build Container / build (prometheus) (push) Successful in 37s
Build Container / build (quartz) (push) Successful in 19s
Build Container / build (navidrome) (push) Successful in 10m36s
Build Container / build (runner-job-image) (push) Successful in 3m18s
Build Container / build (transmission) (push) Successful in 20s
Build Container / build (transmission-exporter) (push) Successful in 21s
Build Container / build (unpoller) (push) Successful in 11s
Build Container / build (teslamate) (push) Successful in 4m42s
Every container now carries title, description, version, source, and
vendor labels per the OCI image spec. Version is derived from the
existing CONTAINER_APP_VERSION ARG at build time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 20:42:00 -07:00
0d2779762a Upgrade Prometheus to v3.10.0 (#301)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (prometheus) (push) Successful in 2s
Build Container / build (prometheus) (push) Successful in 44m11s
## Summary
- Bump Prometheus from v3.9.1 to v3.10.0 in custom container Dockerfile
- v3.10.0 adds distroless Docker image variants, new PromQL `fill` operators, and performance improvements
- Dagger build tested locally — builds cleanly

## Remaining after merge
- Update `kustomization.yaml` newTag with the auto-built image tag
- Update `service-versions.yaml` (last-reviewed + current-version)
- ArgoCD sync

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #301
2026-03-18 07:47:46 -07:00
61f02a0335 Localize Alloy container image (#300)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (alloy) (push) Successful in 14s
Build Container / build (alloy) (push) Successful in 38m34s
## Summary

- Add `containers/alloy/` with dual Dockerfile + Nix build files for Grafana Alloy v1.14.0
- Both builds fetch source from forge mirror (`forge.ops.eblu.me/mirrors/alloy.git`), build the web UI (Node), then compile the Go binary with `netgo embedalloyui` tags
- Update all three alloy deployments (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail) to use `registry.ops.eblu.me/blumeops/alloy`
- `promtail_journal_enabled` tag omitted — requires systemd headers and none of our configs use `loki.source.journal`

## Build verification

- **Dockerfile:** Tested locally via `docker build`, binary reports `v1.14.0` with correct tags
- **Nix:** Tested on ringtail via `nix-build`, all three hashes (fetchgit, npmDeps, goModules) resolved and build succeeds

## Post-merge steps

1. Wait for CI to build the container from main (both Dockerfile and Nix workflows)
2. `mise run container-list alloy` to find the `[main]` tagged image
3. C0 follow-up to update `newTag` in all three kustomizations from `v1.14.0-placeholder` to the real tag
4. Sync ArgoCD apps and verify pods come up healthy

Reviewed-on: #300
2026-03-17 16:42:53 -07:00
11330ebea0 Deploy Mealie recipe manager (#299)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (mealie) (push) Successful in 2s
Build Container / build (mealie) (push) Successful in 8s
## Summary

- Deploy Mealie (self-hosted recipe manager) on minikube-indri via ArgoCD
- Build container from source via forge mirror (`mirrors/mealie`) — multi-stage Dockerfile with Node.js frontend + Python/uv backend
- Add Caddy proxy entry for `meals.ops.eblu.me`
- Part of a larger meal planning pipeline: Mealie stores categorized recipes, a planner script selects balanced meals, and Ollama generates unified cooking timelines

## Status

- [x] Mirror mealie repo on forge
- [x] Dockerfile (from-source build)
- [x] ArgoCD app + k8s manifests
- [x] Caddy proxy entry
- [x] Service docs, routing table, app registry
- [ ] Local Dagger build test
- [ ] Container build + push to registry
- [ ] Update kustomization.yaml with real image tag
- [ ] Deploy and verify
- [ ] Provision Caddy

## Test plan

- Build container locally via `dagger call build --src=. --container-name=mealie`
- Trigger CI build via `mise run container-build-and-release mealie`
- Deploy from branch: `argocd app set mealie --revision deploy-mealie && argocd app sync mealie`
- Verify Mealie UI at `https://meals.ops.eblu.me`
- Verify API docs at `https://meals.ops.eblu.me/docs`

Reviewed-on: #299
2026-03-16 21:59:10 -07:00
4dc3e5cae2 Add UnPoller for UniFi network metrics (#298)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (unpoller) (push) Successful in 2s
Build Container / build (unpoller) (push) Successful in 7s
## Summary
- Deploy UnPoller as a k8s service on indri to export UniFi controller metrics to Prometheus
- Custom-built container from forge mirror (`containers/unpoller/Dockerfile`)
- Credentials pulled from 1Password via external-secrets
- Prometheus scrape job added, docs and service-versions updated

## Test plan
- [ ] Build container: `mise run container-release unpoller v2.34.0`
- [ ] Update kustomization tag with built image tag
- [ ] Deploy from branch: `argocd app set unpoller --revision feature/unpoller && argocd app sync unpoller`
- [ ] Verify pod connects to UX7 controller (check logs)
- [ ] Confirm `unpoller` target appears in Prometheus
- [ ] Query `unifi_` metrics in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #298
2026-03-16 15:52:45 -07:00
ac01c2d6e2 Fix stale docs and shell quoting in devpi start script
- ArgoCD ref: correct Git Source URL to forge.ops.eblu.me:2222
- Authentik ref: add Zot as active OIDC client, blueprint, and secret
- Federated login: remove Zot from Future Work (completed in PR #236)
- devpi/start.sh: use bash array for command building (proper quoting)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 19:25:27 -07:00
4f0476a851 Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 10s
## Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.

**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)

**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs

## Deployment and Testing

- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs

Reviewed-on: #290
2026-03-09 11:59:43 -07:00
ede9a51394 Fix robots.txt: block /explore/ and /tags/ (was /explorer/)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 16s
The previous robots.txt had a typo blocking /explorer/ instead of
/explore/, allowing Facebook's crawler to hit the spider trap.
Also block /tags/ which has the same infinite relative-link issue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:57:45 -07:00
3a811fb188 Deploy JobSync — job search tracker on ringtail k3s (#288)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (jobsync) (push) Successful in 2s
Build Container (Nix) / build (jobsync) (push) Successful in 8s
## Summary

C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.

### Mikado Graph

```
deploy-jobsync (goal)
├── build-jobsync-container
│   └── mirror-jobsync
└── integrate-jobsync-ollama
```

### What is JobSync?

Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.

### Key Decisions

- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed

### Current Status

Planning phase — cards committed, ready for review before implementation begins.

Reviewed-on: #288
2026-03-08 11:02:05 -07:00
24f7512d59 Bump runner-job-image Dagger CLI from 0.20.0 to 0.20.1
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Successful in 2m28s
Phase 1 of Dagger upgrade: update the CLI in the runner container first
so CI can build the new image with the old engine version. See
[[upgrade-dagger]] for the full procedure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:32:05 -08:00
b64010b3c7 Replace spider-trap nginx 404s with robots.txt disallowing /explorer/
All checks were successful
Build Container (Nix) / detect (push) Successful in 3s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (quartz) (push) Successful in 2s
Build Container / build (quartz) (push) Successful in 9s
The /explorer/ SPA endpoints were the source of all spider-trap traffic.
A robots.txt Disallow is a better fix than serving 404s — it prevents
crawlers from entering the infinite URL tree in the first place, avoids
serving large numbers of 404s that hurt SEO, and doesn't break legitimate
deep links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:34:37 -08:00
6636576cdc Add spider-trap guards to docs.eblu.me Quartz nginx config
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 12s
Block recursive crawler paths caused by SPA fallback + relative links:
/tags/ depth >1 returns 404, global depth ≥5 returns 404.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:43:41 -08:00
448689bf2a Bump runner-job-image Dagger CLI from 0.19.11 to 0.20.0
Some checks failed
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 1s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Failing after 2s
The Dagger module was upgraded to v0.20.0 in d15071a but the runner job
image still had the old CLI, causing build-blumeops to fail with a
version mismatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:58:38 -08:00
f6f0f79a5b Bump kiwix-serve from 3.8.1 to 3.8.2
All checks were successful
Build Container (Nix) / detect (push) Successful in 4s
Build Container (Nix) / build (kiwix-serve) (push) Successful in 3s
Build Container / detect (push) Successful in 1m57s
Build Container / build (kiwix-serve) (push) Successful in 1m15s
Minor upstream release with doc and CI fixes. Also corrects kiwix.md
to reference the actual custom registry image and torrents.txt path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:12:32 -08:00
c93448f951 Bump transmission-exporter to v1.0.1
All checks were successful
Build Container / detect (push) Successful in 1s
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 6s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:04:26 -08:00
797133b28e Fix per-torrent rate panels showing cumulative bytes instead of rates
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 38s
Dashboard "Download/Upload Rate by Torrent" panels were querying
transmission_torrent_download_bytes (total_size * percent_done) and
transmission_torrent_upload_bytes (uploaded_ever) — cumulative byte
gauges, not rates. Added new metrics using Transmission's native
rate_download/rate_upload and updated dashboard queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:01:37 -08:00
f2704b26da Replace transmission-exporter with homegrown Python exporter (#283)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (transmission-exporter) (push) Successful in 2s
Build Container / build (transmission-exporter) (push) Successful in 19s
## Summary
- Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter
- Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels)
- Same metric names so existing Grafana Transmission dashboard works unchanged
- Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern

## Changes
- **New:** `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines)
- **New:** `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv
- **Modified:** `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference
- **Modified:** `argocd/manifests/torrent/kustomization.yaml` — add image tag entry
- **Modified:** `service-versions.yaml` — add transmission-exporter entry

## Deployment and Testing
- [ ] Build container: `mise run container-build-and-release transmission-exporter`
- [ ] Update kustomization.yaml newTag with build SHA
- [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent`
- [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics | grep transmission_`
- [ ] Verify Grafana Transmission dashboard panels populate
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

Reviewed-on: #283
2026-03-04 21:55:00 -08:00
b460333da0 Upgrade Transmission to 4.1.1 (#282)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (transmission) (push) Successful in 2s
Build Container / build (transmission) (push) Successful in 6s
## Summary
- Upgrade Transmission from 4.0.6-r4 to 4.1.1-r1
- Uses Alpine edge community repo for transmission packages, keeping stable alpine:3.22 base
- Fix stale image reference in service doc (was linuxserver, now custom registry image)
- Mark transmission as reviewed in service-versions.yaml

## Context
Service review found Transmission two minor versions behind (4.0.6 → 4.1.1). Alpine 3.22 only packages 4.0.6, so transmission is installed from edge's community repo with an exact version pin.

4.1.0 added improved µTP performance, IPv6/dual-stack UDP tracker, JSON-RPC 2.0 API. 4.1.1 is a bugfix release (20+ fixes).

Dagger test build passed locally.

## Deployment and Testing
- [ ] Build container via Forgejo workflow (`mise run container-build-and-release transmission`)
- [ ] Update kustomization.yaml with new image tag
- [ ] `argocd app set torrent --revision feature/transmission-review && argocd app sync torrent`
- [ ] Verify web UI at https://torrent.ops.eblu.me
- [ ] Check Grafana Transmission dashboard still receives metrics
- [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent`

## Note
The transmission-exporter sidecar (OOMKilling every ~30min, 294 restarts) is being tracked separately as a future replacement project.

Reviewed-on: #282
2026-03-04 07:44:33 -08:00
a2bb9abbdb Home-build grafana-sidecar container (#281)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s
Build Container / build (grafana-sidecar) (push) Successful in 6s
## Summary
- Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0`
- Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop
- Adds Dockerfile, service-versions entry, docs, and changelog fragment
- Manifest switch to home-built image pending container build

## Deployment and Testing
- [ ] `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization.yaml with built image tag
- [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana`
- [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me
- [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana`

Reviewed-on: #281
2026-03-03 13:48:24 -08:00
3dc4ed730b Build Loki container image locally (#280)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (loki) (push) Successful in 2s
Build Container / build (loki) (push) Successful in 7s
## Summary
- Add two-stage Dockerfile for Loki (Go build → Alpine runtime) in `containers/loki/`
- Rewrite kustomize image to `registry.ops.eblu.me/blumeops/loki`
- Tag is `v3.6.5-placeholder` until first CI build; will be updated post-build

## Details
- UID 10001 matches existing StatefulSet `securityContext` (runAsUser/fsGroup)
- CGO_ENABLED=0, ldflags embed version via `github.com/grafana/loki/v3/pkg/util/build`
- Clones from `forge.ops.eblu.me/mirrors/loki` (mirror created this session)
- Pattern follows miniflux (two-stage Go) + prometheus (ldflags)

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release loki`
- [ ] Update kustomize tag to actual build tag
- [ ] Deploy from branch: `argocd app set loki --revision feature/loki-container && argocd app sync loki`
- [ ] Verify `/ready` endpoint and log ingestion
- [ ] After merge: update to `[main]` tag (C0 follow-up)

Reviewed-on: #280
2026-03-03 13:00:43 -08:00
eb9bc57351 Upgrade TeslaMate v2.2.0 → v3.0.0 (#279)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (teslamate) (push) Successful in 2s
Build Container / detect (push) Successful in 26s
Build Container / build (teslamate) (push) Successful in 4m22s
## Summary
- Upgrade TeslaMate from v2.2.0 to v3.0.0 (first service review)
- Elixir 1.18 → 1.19.5, runtime base bookworm → trixie
- Adds zstd/brotli build deps for new static asset compression
- DB migration (BTREE → BRIN indexes) runs automatically via entrypoint

## Deployment and Testing
- [ ] Trigger container build: `mise run container-build-and-release teslamate`
- [ ] Update kustomization.yaml with new image tag
- [ ] Deploy from branch: `argocd app set teslamate --revision upgrade/teslamate-v3.0.0 && argocd app sync teslamate`
- [ ] Verify TeslaMate UI loads and data is intact
- [ ] Check logs for migration errors
- [ ] After merge: reset ArgoCD to main, update kustomization tag to `[main]` image

Reviewed-on: #279
2026-03-03 11:56:40 -08:00
2d4098e480 Fix authentik 2026.2.0 migration ordering bug (#275)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 3m6s
## Summary

- Patch `authentik_rbac/0010` migration to depend on `authentik_core/0056`, fixing non-deterministic ordering that crashes startup with `FieldError: Cannot resolve keyword 'group_id'`
- Upstream bug: goauthentik/authentik#19616, #20634 — no fix released yet
- Document the issue in the lessons-learned table

## Deployment and Testing

- [ ] CI builds container image
- [ ] Deploy from branch: `argocd app set authentik --revision fix/authentik-migration-ordering && argocd app sync authentik`
- [ ] Pods reach Running/Ready without crash-looping
- [ ] `kubectl logs` show 0056 migrating before 0010
- [ ] authentik UI loads at authentik.ops.eblu.me
- [ ] `mise run services-check`
- [ ] After merge: `argocd app set authentik --revision main && argocd app sync authentik`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/275
2026-03-01 16:28:36 -08:00
78027eb783 Fix authentik: entry_points API for Python 3.14
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 3m7s
Python 3.14's EntryPoints uses string keys, not integer indices.
eps[0] raises KeyError(0); use next(iter(eps)) instead.
Verified on ringtail with the actual venv python.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:02:49 -08:00
e49d966229 Fix authentik: symlink authentik/ for BASE_DIR resolution
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 3m4s
Django's BASE_DIR is $out but source lives in site-packages. Code
like BASE_DIR / "authentik" / "sources" / "scim" / "schemas" / ...
needs a top-level symlink to find data files alongside Python source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:55:26 -08:00
b7bfb0bfae Fix authentik container: set TMPDIR=/tmp
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 45s
lifecycle/ak uses ${TMPDIR}/authentik-mode — without TMPDIR set it
tries to write /authentik-mode in root, which user 65534 can't do.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:52:36 -08:00
2ac353b7bf Fix authentik container: create /tmp for unprivileged user
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 54s
buildLayeredImage doesn't create /tmp by default. The container runs
as user 65534 (nobody) which can't mkdir /tmp at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:48:05 -08:00
efa9806bfa C2: Build authentik from source (Mikado chain) (#274)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 22s
## Mikado Chain: build-authentik-from-source

Replace `pkgs.authentik` from nixpkgs with a custom Nix derivation built from source.
This removes the dependency on the nixpkgs packaging timeline and gives full version control.

Target version: **2025.12.4** (nixpkgs reference, upgrading from deployed 2025.10.1).

### Dependency Graph

```
build-authentik-from-source (goal)
├── authentik-go-server-derivation
│   ├── authentik-api-client-generation  ← IN PROGRESS
│   └── authentik-python-backend-derivation
├── authentik-web-ui-derivation
│   └── authentik-api-client-generation  ← IN PROGRESS
└── authentik-python-backend-derivation
```

### Ready Leaves
- `authentik-api-client-generation` — Go + TypeScript client generation from OpenAPI schema
- `authentik-python-backend-derivation` — Django backend with 60+ deps, 4 in-tree packages

### Architecture
Ported from [nixpkgs `pkgs/by-name/au/authentik/package.nix`](https://github.com/NixOS/nixpkgs/tree/master/pkgs/by-name/au/authentik):
- `source.nix` — shared version/source fetch
- `client-go.nix` — Go API client generation
- `client-ts.nix` — TypeScript API client generation
- `api-go-vendor-hook.nix` — Go vendor directory injection hook
- (more components to follow as leaves are closed)

### Related Cards
- [[build-authentik-from-source]] — Goal card
- [[authentik-api-client-generation]]
- [[authentik-python-backend-derivation]]
- [[authentik-web-ui-derivation]]
- [[authentik-go-server-derivation]]

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/274
2026-03-01 13:45:00 -08:00
33b7f0f353 Switch prometheus, teslamate, miniflux to forge mirrors
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (miniflux) (push) Successful in 3s
Build Container (Nix) / build (prometheus) (push) Successful in 3s
Build Container (Nix) / build (teslamate) (push) Successful in 2s
Build Container / build (miniflux) (push) Successful in 1m14s
Build Container / build (teslamate) (push) Successful in 13m42s
Build Container / build (prometheus) (push) Successful in 15m20s
Created miniflux mirror at mirrors/miniflux. All three containers
now clone from forge.ops.eblu.me/mirrors/ instead of GitHub directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 21:01:08 -08:00
cd578144f7 Migrate upstream mirrors to mirrors/ Forgejo org (#265)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (homepage) (push) Successful in 3s
Build Container (Nix) / build (navidrome) (push) Successful in 3s
Build Container (Nix) / build (ntfy) (push) Successful in 8s
Build Container / detect (push) Successful in 42s
Build Container / build (navidrome) (push) Successful in 9m37s
Build Container / build (homepage) (push) Successful in 9m56s
Build Container / build (ntfy) (push) Successful in 2m35s
## Summary

- Created `mirrors` Forgejo organization for upstream mirror repos
- Transferred 22 mirror repos from `eblume/` to `mirrors/` (mirror sync config preserved)
- Deleted unused repos: hajimari, hister
- Updated all container build URLs (homepage, navidrome, ntfy Dockerfiles + nix)
- Updated documentation references (migrate-forgejo-from-brew, upstream-fork-strategy, fix-ntfy-nix-version)
- `dotfiles` intentionally kept under `eblume/` per user request
- `devpi` transferred to `mirrors/`

Repos remaining under `eblume/`: blumeops, cv, mcquack, dotfiles

## Cleanup TODO

- [ ] Delete temp Forgejo API token "claude-migration-temp" (Settings > Applications)

## Test Plan

- [x] Verified mirror config (mirror=true, original_url) survived transfer on test repo (tesla_auth)
- [x] All pre-commit hooks pass (including container-version-check, docs-check-links)
- [ ] Verify a mirror repo sync runs successfully after transfer (check mirrors/authentik or similar)
- [ ] Rebuild containers from branch to verify Dockerfile URLs resolve

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/265
2026-02-24 20:43:14 -08:00
2ba5d8a8aa Port Prometheus to local container build (#262)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (prometheus) (push) Successful in 2s
Build Container / build (prometheus) (push) Successful in 7s
## Summary
- Add three-stage Dockerfile for Prometheus v3.9.1 (Node UI → Go binaries → Alpine runtime)
- Produces `prometheus` and `promtool` binaries with embedded web UI assets
- Follows navidrome/ntfy pattern for supply chain control via Zot registry

## Deployment and Testing
- [ ] `dagger call build --src=. --container-name=prometheus` succeeds
- [ ] Container reports correct version via `prometheus --version`
- [ ] `promtool --version` works
- [ ] Update statefulset image reference after successful build
- [ ] Deploy from branch: `argocd app set prometheus --revision <branch> && argocd app sync prometheus`
- [ ] Health probes pass (`/-/healthy`, `/-/ready`)
- [ ] Web UI loads, scrape targets work, remote write functions

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/262
2026-02-24 09:15:57 -08:00