Commit graph

878 commits

Author SHA1 Message Date
7f6bbdc82c Add robots.txt to forge.eblu.me blocking crawlers from /mirrors/
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m19s
Facebook has been scraping forge mirror repos at ~3-4 req/s, slowing
down the Forgejo instance. Serve robots.txt directly from nginx to
disallow /mirrors/ while leaving eblume/* accessible to crawlers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:39:48 -07:00
5ec2411e20 Update navidrome, miniflux, forgejo-runner image tags to Alpine 3.23 builds [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:37:30 -07:00
3ecd888537 Switch container builds to manual-only workflow dispatch
Shared Dagger helpers (src/blumeops/) affect all Dagger-built containers,
making path-based auto-triggers unreliable. All builds now go through
`mise run container-build-and-release <name>`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:25:14 -07:00
352b95c141 Refactor Dagger go_build() helper and standardize Alpine 3.23
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 10m2s
Build Container / build-dagger (forgejo-runner) (push) Successful in 10m2s
Extend go_build() with buildmode and extra_env params, migrate miniflux
and forgejo-runner to use it, and bump all Alpine bases from 3.22 to 3.23.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:10:46 -07:00
99f78c8745 Register claude-cli:// URI handler on ringtail for Claude Code OAuth
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:42:52 -07:00
fb1e8ff672 Deploy transmission containers from Dagger builds
Update kustomization image tags to the new container.py-built images
(v4.1.1-r1-2c483ce, v1.0.1-2c483ce).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:34:28 -07:00
2c483cefff Migrate transmission containers from Dockerfile to Dagger builds
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (transmission-exporter) (push) Successful in 2m29s
Build Container / build-dagger (transmission) (push) Successful in 2m29s
Replace Dockerfiles with native container.py for both transmission and
transmission-exporter. Updates base images (Alpine 3.23, Python 3.14),
pins uv to 0.11.6 instead of :latest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:26:00 -07:00
519175c672 Fix borgmatic LaunchAgent TCC dialog hang by removing mise wrapper
LaunchAgents now call borgmatic directly at its mise-installed path
instead of routing through `mise x`, which triggered macOS TCC
permission dialogs (e.g. "mise wants to access Documents") that hung
headless sessions and caused backup failures.

Also adds `mise install` to the ansible role so borgmatic installation
is fully managed, and pins the version in both mise.toml and the role
defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:23:46 -07:00
30ed018fd8 Update prowler image tag to v5.23.0-7c1cd11 [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:51:26 -07:00
7c1cd11e45 Upgrade Prowler to 5.23.0, remove registry workaround (#336)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (prowler) (push) Successful in 36s
## Summary

- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released

The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.

## Test plan

- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #336
2026-04-14 13:45:28 -07:00
6b690eb033 Review CC sso-gated-admin-tools: scope to ArgoCD only
Removed Grafana from the control description — no Prowler finding
references it. Tightened scope to match actual usage (ArgoCD wildcard
RBAC mute). Added workflow-bot scoping note.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:07:52 -07:00
be30668eef Automate Prowler MANUAL finding verification (#335)
## Summary
- Adds automated node-level verification to `review-compliance-reports`: kubelet file perms/ownership, kubelet config args, etcd CA separation, RBAC cluster-admin bindings
- Mutes the 14 MANUAL Prowler findings via new `manual-node-checks.yaml` mutelist file
- New `node-config-automated-verification` compensating control documents the approach
- Script fails loudly (red FAIL + verdict panel) if any check deviates from expected values

## Test plan
- [x] `mise run review-compliance-reports` — all 12 node checks PASS
- [x] Injected bad expected value (perms 400 vs actual 600) — FAIL rendered correctly
- [x] Fixed colon-in-binding-name bug (kubeadm:cluster-admins) with tab-separated jsonpath
- [ ] After merge: sync prowler mutelist ConfigMap and verify next scan shows 0 MANUAL findings

## Note
Prowler coverage is minikube-indri only — ringtail/k3s is a known gap tracked separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #335
2026-04-14 13:00:44 -07:00
Forgejo Actions
8c2f035e6d Update docs release to v1.15.6
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:46:42 -07:00
04b44b350b Add changelog for ArgoCD token rotation after DR v1.15.6
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:45:00 -07:00
Forgejo Actions
f2514a6f02 Update docs release to v1.15.5
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:29:27 -07:00
9d85c97b9b Update forgejo-runner kustomization tag to main-branch image v1.15.5
C0 follow-up: switch from branch-built tag to main-built v12.7.3-0e93cc0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:10:36 -07:00
0e93cc08b4 Build forgejo-runner container locally (#334)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (forgejo-runner) (push) Successful in 1m21s
## Summary
- Add native Dagger `container.py` for forgejo-runner (Go + Alpine runtime, static binary with CGO for SQLite)
- Update kustomization to point to local registry image (tag is placeholder until CI builds)
- Uses existing `clone_from_forge("forgejo-runner", ...)` mirror

## Test plan
- [x] `dagger call build --src=. --container-name=forgejo-runner` passes locally
- [ ] CI container build from branch succeeds
- [ ] Update kustomization tag to built image, deploy from branch via ArgoCD `--revision`
- [ ] Verify runner registers and picks up jobs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #334
2026-04-14 11:06:36 -07:00
223b134776 Document uv.lock as the source of devpi dependency in Dagger builds
The lockfile bakes in devpi URLs — Dagger does a locked install, not
fresh resolution. This is the mechanism behind the cold-cache failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:41:45 -07:00
ccaef4c1a7 Document devpi cold cache failure mode and deploy teslamate v3.0.0-08c698e
After a DR rebuild, devpi's empty cache causes race conditions under
concurrent load — metadata is served but wheel files 404. Also deploys
the first container.py-built teslamate image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:38:06 -07:00
08c698e833 Migrate teslamate to native Dagger container.py (#333)
Some checks failed
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (teslamate) (push) Failing after 6s
## Summary
- Replace legacy Dockerfile with native Dagger `container.py` build
- Two-stage pipeline: Elixir+Node builder, Debian slim runtime
- Uses shared helpers (`clone_from_forge`, `oci_labels`)
- Delete old Dockerfile (pipeline auto-discovers container.py)
- Update build-container-image docs and mark service reviewed

## Test plan
- [x] `dagger call build --src=. --container-name=teslamate` succeeds locally
- [ ] CI container build passes
- [ ] Deploy from branch and verify teslamate starts cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #333
2026-04-14 07:20:52 -07:00
4ca0630d76 Review enforce-tag-immutability doc: add review date and zot reference link
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:00:55 -07:00
d7c3c687f4 Document DR rebuild procedure and update restart-indri
- New how-to: rebuild-minikube-cluster with full bootstrap procedure
  validated during 2026-04-13 DR event
- Update restart-indri: warn about minikube delete, macOS permission
  dialog on first Tailscale SSH, forgejo_actions_secrets dep cycle
- Update disaster-recovery reference: link to rebuild procedure
- Update CLAUDE.md: never run minikube delete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:07:54 -07:00
405dab8b59 Add changelog fragments for DR recovery work
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:59:16 -07:00
cd5b6b63f7 Add paperless DB to borgmatic backups
Discovered during DR that paperless was the only service DB not backed
up by borgmatic. Uses same blumeops-pg cluster on port 5432.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:58:06 -07:00
2d2d495f95 Fix paperless redis: use upstream valkey instead of amd64-only nix image
The authentik-redis image is nix-built on ringtail (amd64 only) and was
previously running under QEMU emulation on arm64 minikube. Discovered
during DR recovery when fresh minikube lacked binfmt registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:48:20 -07:00
fca3010042 Hints about service version tracking 2026-04-13 08:40:49 -07:00
22a417ac3c Oops, looks like a log file got lost, nbd 2026-04-13 08:36:20 -07:00
f61bb4f2e7 Add uv.lock for version pinning of dagger pipeline 2026-04-13 08:35:01 -07:00
b5551e227e Route Dagger build telemetry to Tempo
The Dagger engine's internal OTLP proxy returns 500 on /v1/metrics when
there's no real backend, causing ~9s retry warnings per pipeline step.
Point OTEL_EXPORTER_OTLP_ENDPOINT at Tempo to give it a real endpoint.
Also removes the stale os.environ workaround from main.py (the SDK
initializes telemetry before our module loads, so it had no effect).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:27:12 -07:00
ab834b641a Fix OTEL metrics exporter warnings in Dagger builds
The Dagger engine shim sets OTEL_METRICS_EXPORTER before our module
loads, so os.environ.setdefault was a no-op. Switch to a hard override.
Remove the redundant workflow-level env var since the fix belongs in
the module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:11:15 -07:00
db6d8af8b1 Update grafana-sidecar image tag to v2.6.0-61fcd5d (merge build)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:02:39 -07:00
61fcd5d70a Upgrade grafana-sidecar 1.28.0 → 2.6.0 + container.py port (#332)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dagger (grafana-sidecar) (push) Successful in 1m50s
## Summary

- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note

## Test plan

- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`

Reviewed-on: #332
2026-04-13 07:57:13 -07:00
6455d93cb3 Review local-registry control: fix inaccurate description, enumerate exceptions
The control claimed all images came from the private registry, but 12+
services pull from external public registries. Updated description to
reflect reality and catalogued external-image categories in notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:59:37 -07:00
6e60287e99 Doc review: delete install-dagger-on-nix-runner, add service-versions ref card
Outdated leaf card removed; zot.md now links to new service-versions
reference card instead. Added reverse link from review-services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:52:38 -07:00
8d80a4a3a5 Rewrite runner-logs: API-based log fetching, multi-repo support
Replace broken SSH+filesystem log retrieval with Forgejo web API
endpoint. Fix CLI to use run numbers (not task IDs), add --repo
for querying any forge repo (e.g. sporks), --limit/-n for listing
size. Document runner-logs as the way to verify build success in
CLAUDE.md and container build docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:42:58 -07:00
a18ec9d958 Update miniflux to main image tag, disable OTEL metrics in Dagger module
Point miniflux kustomization at the main-built v2.2.19-138e23d image
(replacing the branch tag). Disable the OTLP metrics exporter at module
import time to prevent ~11s retry delays in CI — the env var must be set
inside the module, not the runner shell, because the SDK runs inside the
Dagger engine container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:59:32 -07:00
138e23d525 Miniflux 2.2.19 + container.py migration + ty typechecker (#331)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 1m3s
## Summary

- Upgrade miniflux from 2.2.17 to 2.2.19 (security hardening, performance improvements)
- Migrate miniflux from Dockerfile to native Dagger container.py build
- Refactor `alpine_runtime()` helper to support existing users (nobody/65534)
- Add `ty` (Astral) Python typechecker to prek hooks

## Test plan

- [ ] `dagger call build --src=. --container-name=miniflux` succeeds
- [ ] `dagger call container-version --container-name=miniflux` returns 2.2.19
- [ ] `mise run container-version-check` passes
- [ ] `ty check` passes cleanly
- [ ] `prek run --all-files` passes
- [ ] CI builds container successfully
- [ ] Miniflux healthcheck passes after deploy from branch

Reviewed-on: #331
2026-04-12 08:54:32 -07:00
dc5bffdd97 Update ringtail flake inputs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:14:46 -07:00
c06eccc61c Review hosts.md: add last-reviewed, normalize links, add reference tag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:06:53 -07:00
94c937d588 Disable OTLP metrics exporter in CI, update navidrome to main tag
The Dagger Python SDK's OTLP metrics exporter hits a non-functional
local endpoint (500s), burning ~9s per retry cycle. Set
OTEL_METRICS_EXPORTER=none in the build-dagger CI job.

Also update navidrome kustomization to the main-SHA tag (c86b5d7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:26:25 -07:00
c86b5d7772 Native Dagger container builds + Navidrome v0.61.1 (#330)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (navidrome) (push) Successful in 22m26s
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers

## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.

## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```

## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me

## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.

Reviewed-on: #330
2026-04-11 17:11:56 -07:00
4fc0192731 Track Fly.io proxy component versions in service-versions.yaml
Add flyio-tailscale (v1.94.1), flyio-nginx (1.29.6-alpine), and
flyio-alloy (v1.14.1) entries with new `fly` service type so future
upgrades go through the service-review workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:40:57 -07:00
e02305e72d Pin Fly.io Tailscale to v1.94.1 to fix MagicDNS regression in v1.96.5
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m20s
Tailscale :stable pulled v1.96.5 during last deploy, which returns
SERVFAIL for tailnet DNS names (no upstream resolvers set). This broke
all public routing (forge/docs/cv.eblu.me) through the Fly proxy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:32:38 -07:00
b08b1a833f Fix services-check to show all firing alerts per alert name
check_alert() used head -1 to display only the first firing instance,
silently swallowing additional alerts (e.g. frigate pod-not-ready was
hidden behind ollama).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:10:09 -07:00
a75f28e073 Fix fly.io proxy rate limit to key on real client IP
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m24s
The general rate limit zone used $binary_remote_addr (Fly's internal
proxy IP), causing all external clients to share one bucket. Switch to
$http_fly_client_ip to match forge_auth's correct behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:00:33 -07:00
40556e5a2d Review gandi.md: add missing forge.eblu.me CNAME record
The Pulumi code has had a forge.eblu.me CNAME since it was added, but the
doc's DNS table only listed docs and cv. Also fixed the __main__.py
description to mention CNAMEs alongside A records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:54:46 -07:00
5757df115d Upgrade ollama from 0.17.5 to 0.20.4
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:42:05 -07:00
22fc615a28 Update paperless image tag to main build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:01:02 -07:00
07f52e9488 Deploy Paperless-ngx document management (#328)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (paperless) (push) Successful in 9s
## Summary

- Add paperless-ngx (v2.20.13) as a new ArgoCD-managed service on indri
- Dockerfile built from forge mirror (`mirrors/paperless-ngx`), multi-stage with s6-overlay
- PostgreSQL database via `blumeops-pg` CNPG cluster, Redis sidecar for Celery
- NFS document storage on sifaka (`/volume1/paperless`)
- Authentik OIDC SSO via baked JSON blob from 1Password
- Caddy route at `paperless.ops.eblu.me`
- 1Password item "Paperless (blumeops)" created with all secrets

## Files

- `containers/paperless/Dockerfile` — multi-stage build
- `argocd/manifests/paperless/` — full k8s manifest set
- `argocd/apps/paperless.yaml` — ArgoCD application
- `argocd/manifests/databases/` — CNPG role + ExternalSecret
- `ansible/roles/caddy/defaults/main.yml` — Caddy route
- `service-versions.yaml` — version tracking entry
- `docs/reference/services/paperless.md` — reference card

## Remaining deploy steps

1. Build container: `mise run container-build-and-release paperless`
2. Update kustomization.yaml `newTag` with actual image tag
3. Create Authentik application/provider for paperless
4. Create `paperless` database on blumeops-pg
5. Sync ArgoCD apps, then sync paperless from branch
6. Provision Caddy: `mise run provision-indri -- --tags caddy`
7. Verify at https://paperless.ops.eblu.me

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #328
2026-04-08 17:54:12 -07:00
e04455c911 Add changelog fragment for adding-a-service tutorial review
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:29:54 -07:00