Commit graph

981 commits

Author SHA1 Message Date
50dfdba4e6 Add Firefox, remove claude-cli:// handler workarounds
The xdg desktop entry and Librewolf user.js prefs didn't fix the
OAuth callback hang. Try stock Firefox instead as a simpler path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:42:36 -07:00
dd1cf4f198 Configure Librewolf to delegate claude-cli:// URIs to xdg-open
The xdg desktop entry and mimeapps were already registered but
Librewolf doesn't delegate unknown URI schemes to the system
handler by default. This adds user.js prefs to complete the chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:37:16 -07:00
68f845e773 Add changelog fragment for forge robots.txt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:40:34 -07:00
7f6bbdc82c Add robots.txt to forge.eblu.me blocking crawlers from /mirrors/
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m19s
Facebook has been scraping forge mirror repos at ~3-4 req/s, slowing
down the Forgejo instance. Serve robots.txt directly from nginx to
disallow /mirrors/ while leaving eblume/* accessible to crawlers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:39:48 -07:00
5ec2411e20 Update navidrome, miniflux, forgejo-runner image tags to Alpine 3.23 builds [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:37:30 -07:00
3ecd888537 Switch container builds to manual-only workflow dispatch
Shared Dagger helpers (src/blumeops/) affect all Dagger-built containers,
making path-based auto-triggers unreliable. All builds now go through
`mise run container-build-and-release <name>`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:25:14 -07:00
352b95c141 Refactor Dagger go_build() helper and standardize Alpine 3.23
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 10m2s
Build Container / build-dagger (forgejo-runner) (push) Successful in 10m2s
Extend go_build() with buildmode and extra_env params, migrate miniflux
and forgejo-runner to use it, and bump all Alpine bases from 3.22 to 3.23.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:10:46 -07:00
99f78c8745 Register claude-cli:// URI handler on ringtail for Claude Code OAuth
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:42:52 -07:00
fb1e8ff672 Deploy transmission containers from Dagger builds
Update kustomization image tags to the new container.py-built images
(v4.1.1-r1-2c483ce, v1.0.1-2c483ce).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:34:28 -07:00
2c483cefff Migrate transmission containers from Dockerfile to Dagger builds
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (transmission-exporter) (push) Successful in 2m29s
Build Container / build-dagger (transmission) (push) Successful in 2m29s
Replace Dockerfiles with native container.py for both transmission and
transmission-exporter. Updates base images (Alpine 3.23, Python 3.14),
pins uv to 0.11.6 instead of :latest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:26:00 -07:00
519175c672 Fix borgmatic LaunchAgent TCC dialog hang by removing mise wrapper
LaunchAgents now call borgmatic directly at its mise-installed path
instead of routing through `mise x`, which triggered macOS TCC
permission dialogs (e.g. "mise wants to access Documents") that hung
headless sessions and caused backup failures.

Also adds `mise install` to the ansible role so borgmatic installation
is fully managed, and pins the version in both mise.toml and the role
defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:23:46 -07:00
30ed018fd8 Update prowler image tag to v5.23.0-7c1cd11 [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:51:26 -07:00
7c1cd11e45 Upgrade Prowler to 5.23.0, remove registry workaround (#336)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (prowler) (push) Successful in 36s
## Summary

- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released

The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.

## Test plan

- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #336
2026-04-14 13:45:28 -07:00
6b690eb033 Review CC sso-gated-admin-tools: scope to ArgoCD only
Removed Grafana from the control description — no Prowler finding
references it. Tightened scope to match actual usage (ArgoCD wildcard
RBAC mute). Added workflow-bot scoping note.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:07:52 -07:00
be30668eef Automate Prowler MANUAL finding verification (#335)
## Summary
- Adds automated node-level verification to `review-compliance-reports`: kubelet file perms/ownership, kubelet config args, etcd CA separation, RBAC cluster-admin bindings
- Mutes the 14 MANUAL Prowler findings via new `manual-node-checks.yaml` mutelist file
- New `node-config-automated-verification` compensating control documents the approach
- Script fails loudly (red FAIL + verdict panel) if any check deviates from expected values

## Test plan
- [x] `mise run review-compliance-reports` — all 12 node checks PASS
- [x] Injected bad expected value (perms 400 vs actual 600) — FAIL rendered correctly
- [x] Fixed colon-in-binding-name bug (kubeadm:cluster-admins) with tab-separated jsonpath
- [ ] After merge: sync prowler mutelist ConfigMap and verify next scan shows 0 MANUAL findings

## Note
Prowler coverage is minikube-indri only — ringtail/k3s is a known gap tracked separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #335
2026-04-14 13:00:44 -07:00
Forgejo Actions
8c2f035e6d Update docs release to v1.15.6
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:46:42 -07:00
04b44b350b Add changelog for ArgoCD token rotation after DR v1.15.6
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:45:00 -07:00
Forgejo Actions
f2514a6f02 Update docs release to v1.15.5
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:29:27 -07:00
9d85c97b9b Update forgejo-runner kustomization tag to main-branch image v1.15.5
C0 follow-up: switch from branch-built tag to main-built v12.7.3-0e93cc0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:10:36 -07:00
0e93cc08b4 Build forgejo-runner container locally (#334)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (forgejo-runner) (push) Successful in 1m21s
## Summary
- Add native Dagger `container.py` for forgejo-runner (Go + Alpine runtime, static binary with CGO for SQLite)
- Update kustomization to point to local registry image (tag is placeholder until CI builds)
- Uses existing `clone_from_forge("forgejo-runner", ...)` mirror

## Test plan
- [x] `dagger call build --src=. --container-name=forgejo-runner` passes locally
- [ ] CI container build from branch succeeds
- [ ] Update kustomization tag to built image, deploy from branch via ArgoCD `--revision`
- [ ] Verify runner registers and picks up jobs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #334
2026-04-14 11:06:36 -07:00
223b134776 Document uv.lock as the source of devpi dependency in Dagger builds
The lockfile bakes in devpi URLs — Dagger does a locked install, not
fresh resolution. This is the mechanism behind the cold-cache failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:41:45 -07:00
ccaef4c1a7 Document devpi cold cache failure mode and deploy teslamate v3.0.0-08c698e
After a DR rebuild, devpi's empty cache causes race conditions under
concurrent load — metadata is served but wheel files 404. Also deploys
the first container.py-built teslamate image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:38:06 -07:00
08c698e833 Migrate teslamate to native Dagger container.py (#333)
Some checks failed
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (teslamate) (push) Failing after 6s
## Summary
- Replace legacy Dockerfile with native Dagger `container.py` build
- Two-stage pipeline: Elixir+Node builder, Debian slim runtime
- Uses shared helpers (`clone_from_forge`, `oci_labels`)
- Delete old Dockerfile (pipeline auto-discovers container.py)
- Update build-container-image docs and mark service reviewed

## Test plan
- [x] `dagger call build --src=. --container-name=teslamate` succeeds locally
- [ ] CI container build passes
- [ ] Deploy from branch and verify teslamate starts cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #333
2026-04-14 07:20:52 -07:00
4ca0630d76 Review enforce-tag-immutability doc: add review date and zot reference link
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:00:55 -07:00
d7c3c687f4 Document DR rebuild procedure and update restart-indri
- New how-to: rebuild-minikube-cluster with full bootstrap procedure
  validated during 2026-04-13 DR event
- Update restart-indri: warn about minikube delete, macOS permission
  dialog on first Tailscale SSH, forgejo_actions_secrets dep cycle
- Update disaster-recovery reference: link to rebuild procedure
- Update CLAUDE.md: never run minikube delete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:07:54 -07:00
405dab8b59 Add changelog fragments for DR recovery work
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:59:16 -07:00
cd5b6b63f7 Add paperless DB to borgmatic backups
Discovered during DR that paperless was the only service DB not backed
up by borgmatic. Uses same blumeops-pg cluster on port 5432.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:58:06 -07:00
2d2d495f95 Fix paperless redis: use upstream valkey instead of amd64-only nix image
The authentik-redis image is nix-built on ringtail (amd64 only) and was
previously running under QEMU emulation on arm64 minikube. Discovered
during DR recovery when fresh minikube lacked binfmt registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:48:20 -07:00
fca3010042 Hints about service version tracking 2026-04-13 08:40:49 -07:00
22a417ac3c Oops, looks like a log file got lost, nbd 2026-04-13 08:36:20 -07:00
f61bb4f2e7 Add uv.lock for version pinning of dagger pipeline 2026-04-13 08:35:01 -07:00
b5551e227e Route Dagger build telemetry to Tempo
The Dagger engine's internal OTLP proxy returns 500 on /v1/metrics when
there's no real backend, causing ~9s retry warnings per pipeline step.
Point OTEL_EXPORTER_OTLP_ENDPOINT at Tempo to give it a real endpoint.
Also removes the stale os.environ workaround from main.py (the SDK
initializes telemetry before our module loads, so it had no effect).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:27:12 -07:00
ab834b641a Fix OTEL metrics exporter warnings in Dagger builds
The Dagger engine shim sets OTEL_METRICS_EXPORTER before our module
loads, so os.environ.setdefault was a no-op. Switch to a hard override.
Remove the redundant workflow-level env var since the fix belongs in
the module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:11:15 -07:00
db6d8af8b1 Update grafana-sidecar image tag to v2.6.0-61fcd5d (merge build)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:02:39 -07:00
61fcd5d70a Upgrade grafana-sidecar 1.28.0 → 2.6.0 + container.py port (#332)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dagger (grafana-sidecar) (push) Successful in 1m50s
## Summary

- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note

## Test plan

- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`

Reviewed-on: #332
2026-04-13 07:57:13 -07:00
6455d93cb3 Review local-registry control: fix inaccurate description, enumerate exceptions
The control claimed all images came from the private registry, but 12+
services pull from external public registries. Updated description to
reflect reality and catalogued external-image categories in notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:59:37 -07:00
6e60287e99 Doc review: delete install-dagger-on-nix-runner, add service-versions ref card
Outdated leaf card removed; zot.md now links to new service-versions
reference card instead. Added reverse link from review-services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:52:38 -07:00
8d80a4a3a5 Rewrite runner-logs: API-based log fetching, multi-repo support
Replace broken SSH+filesystem log retrieval with Forgejo web API
endpoint. Fix CLI to use run numbers (not task IDs), add --repo
for querying any forge repo (e.g. sporks), --limit/-n for listing
size. Document runner-logs as the way to verify build success in
CLAUDE.md and container build docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:42:58 -07:00
a18ec9d958 Update miniflux to main image tag, disable OTEL metrics in Dagger module
Point miniflux kustomization at the main-built v2.2.19-138e23d image
(replacing the branch tag). Disable the OTLP metrics exporter at module
import time to prevent ~11s retry delays in CI — the env var must be set
inside the module, not the runner shell, because the SDK runs inside the
Dagger engine container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:59:32 -07:00
138e23d525 Miniflux 2.2.19 + container.py migration + ty typechecker (#331)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 1m3s
## Summary

- Upgrade miniflux from 2.2.17 to 2.2.19 (security hardening, performance improvements)
- Migrate miniflux from Dockerfile to native Dagger container.py build
- Refactor `alpine_runtime()` helper to support existing users (nobody/65534)
- Add `ty` (Astral) Python typechecker to prek hooks

## Test plan

- [ ] `dagger call build --src=. --container-name=miniflux` succeeds
- [ ] `dagger call container-version --container-name=miniflux` returns 2.2.19
- [ ] `mise run container-version-check` passes
- [ ] `ty check` passes cleanly
- [ ] `prek run --all-files` passes
- [ ] CI builds container successfully
- [ ] Miniflux healthcheck passes after deploy from branch

Reviewed-on: #331
2026-04-12 08:54:32 -07:00
dc5bffdd97 Update ringtail flake inputs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:14:46 -07:00
c06eccc61c Review hosts.md: add last-reviewed, normalize links, add reference tag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:06:53 -07:00
94c937d588 Disable OTLP metrics exporter in CI, update navidrome to main tag
The Dagger Python SDK's OTLP metrics exporter hits a non-functional
local endpoint (500s), burning ~9s per retry cycle. Set
OTEL_METRICS_EXPORTER=none in the build-dagger CI job.

Also update navidrome kustomization to the main-SHA tag (c86b5d7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:26:25 -07:00
c86b5d7772 Native Dagger container builds + Navidrome v0.61.1 (#330)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (navidrome) (push) Successful in 22m26s
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers

## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.

## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```

## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me

## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.

Reviewed-on: #330
2026-04-11 17:11:56 -07:00
4fc0192731 Track Fly.io proxy component versions in service-versions.yaml
Add flyio-tailscale (v1.94.1), flyio-nginx (1.29.6-alpine), and
flyio-alloy (v1.14.1) entries with new `fly` service type so future
upgrades go through the service-review workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:40:57 -07:00
e02305e72d Pin Fly.io Tailscale to v1.94.1 to fix MagicDNS regression in v1.96.5
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m20s
Tailscale :stable pulled v1.96.5 during last deploy, which returns
SERVFAIL for tailnet DNS names (no upstream resolvers set). This broke
all public routing (forge/docs/cv.eblu.me) through the Fly proxy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:32:38 -07:00
b08b1a833f Fix services-check to show all firing alerts per alert name
check_alert() used head -1 to display only the first firing instance,
silently swallowing additional alerts (e.g. frigate pod-not-ready was
hidden behind ollama).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:10:09 -07:00
a75f28e073 Fix fly.io proxy rate limit to key on real client IP
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m24s
The general rate limit zone used $binary_remote_addr (Fly's internal
proxy IP), causing all external clients to share one bucket. Switch to
$http_fly_client_ip to match forge_auth's correct behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:00:33 -07:00
40556e5a2d Review gandi.md: add missing forge.eblu.me CNAME record
The Pulumi code has had a forge.eblu.me CNAME since it was added, but the
doc's DNS table only listed docs and cv. Also fixed the __main__.py
description to mention CNAMEs alongside A records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:54:46 -07:00
5757df115d Upgrade ollama from 0.17.5 to 0.20.4
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:42:05 -07:00