Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's
redirect_uris so `argocd login --sso` works. Loopback redirect is the
RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the
code-interception risk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Built from main in run #516 after #339 merged. Follows the navidrome
kustomization convention (deployment image = local ref + :kustomized,
kustomization override = newTag only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09)
to the equivalent image rebuilt by CI from main. Functionally identical;
closes the bootstrap loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Points the k8s Forgejo runner label at the locally-bootstrapped
runner-job-image built from the Alpine container.py on this branch.
Once merged, CI will rebuild the same image from the same SHA.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets
## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits
Reviewed-on: #338
All 7 ArgoCD containers had no resource limits, allowing them to consume
unlimited CPU/memory during node pressure events. This contributed to
cluster-wide probe timeout cascades on minikube-indri.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The nix-built Alloy image sets User=65534 (nobody). Even with
privileged: true, a non-root user gets no effective capabilities
(CapEff=0). Override with runAsUser: 0 so Beyla gets CAP_BPF and
CAP_SYS_ADMIN needed for eBPF instrumentation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)
## Trade-off
DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.
## Still TODO on this branch
- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #337
Points kustomization at v3.8.2-7a42aeb, the first image built from the
new container.py (replacing the Dockerfile).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update kustomization image tags to the new container.py-built images
(v4.1.1-r1-2c483ce, v1.0.1-2c483ce).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released
The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.
## Test plan
- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #336
## Summary
- Adds automated node-level verification to `review-compliance-reports`: kubelet file perms/ownership, kubelet config args, etcd CA separation, RBAC cluster-admin bindings
- Mutes the 14 MANUAL Prowler findings via new `manual-node-checks.yaml` mutelist file
- New `node-config-automated-verification` compensating control documents the approach
- Script fails loudly (red FAIL + verdict panel) if any check deviates from expected values
## Test plan
- [x] `mise run review-compliance-reports` — all 12 node checks PASS
- [x] Injected bad expected value (perms 400 vs actual 600) — FAIL rendered correctly
- [x] Fixed colon-in-binding-name bug (kubeadm:cluster-admins) with tab-separated jsonpath
- [ ] After merge: sync prowler mutelist ConfigMap and verify next scan shows 0 MANUAL findings
## Note
Prowler coverage is minikube-indri only — ringtail/k3s is a known gap tracked separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #335
## Summary
- Add native Dagger `container.py` for forgejo-runner (Go + Alpine runtime, static binary with CGO for SQLite)
- Update kustomization to point to local registry image (tag is placeholder until CI builds)
- Uses existing `clone_from_forge("forgejo-runner", ...)` mirror
## Test plan
- [x] `dagger call build --src=. --container-name=forgejo-runner` passes locally
- [ ] CI container build from branch succeeds
- [ ] Update kustomization tag to built image, deploy from branch via ArgoCD `--revision`
- [ ] Verify runner registers and picks up jobs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #334
After a DR rebuild, devpi's empty cache causes race conditions under
concurrent load — metadata is served but wheel files 404. Also deploys
the first container.py-built teslamate image.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The authentik-redis image is nix-built on ringtail (amd64 only) and was
previously running under QEMU emulation on arm64 minikube. Discovered
during DR recovery when fresh minikube lacked binfmt registration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note
## Test plan
- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`
Reviewed-on: #332
Point miniflux kustomization at the main-built v2.2.19-138e23d image
(replacing the branch tag). Disable the OTLP metrics exporter at module
import time to prevent ~11s retry delays in CI — the env var must be set
inside the module, not the runner shell, because the SDK runs inside the
Dagger engine container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dagger Python SDK's OTLP metrics exporter hits a non-functional
local endpoint (500s), burning ~9s per retry cycle. Set
OTEL_METRICS_EXPORTER=none in the build-dagger CI job.
Also update navidrome kustomization to the main-SHA tag (c86b5d7).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers
## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.
## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```
## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me
## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.
Reviewed-on: #330
preview.quality was at the top level (invalid); moved under record
with a valid preset (very_low). Also fix services-check to catch
Grafana "Alerting (NoData)" state which was silently passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previews are ~4MB/hour at default quality (CRF 1), served over NFS from
sifaka. Reducing to CRF 8 shrinks preview files to improve review page
load times when scrubbing older footage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip redundant "unifi-poller-" prefix from generated slugs, bringing
UIDs from 45-48 chars down to 32-35 chars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
C0 follow-up to #327: update from branch-SHA tags to main-SHA tags
after squash-merge rebuild.
indri: v2.18.0-f59f885
ringtail: v2.18.0-f59f885-nix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Build kube-state-metrics v2.18.0 locally from forge mirror, replacing upstream `registry.k8s.io` image
- Dockerfile (two-stage Go build) for indri/minikube
- default.nix (buildGoModule + buildLayeredImage) for ringtail/k3s
- Both kustomization files updated with `newName` pointing to local registry
## Verification
- [x] Nix build succeeded on ringtail (`nix-build` → 10-layer image)
- [x] Dockerfile build succeeded locally (`dagger call build` → ~2min)
- [x] `container-version-check --all-files` passes (2.18.0 consistent across Dockerfile, nix, service-versions.yaml)
- [ ] CI builds container images from this branch
- [ ] Update kustomization `newTag` with SHA-tagged version from CI
- [ ] ArgoCD sync on both clusters
## Test plan
- Trigger CI build: `mise run container-build-and-release kube-state-metrics`
- Verify tags: `mise run container-list kube-state-metrics`
- Update newTag in kustomization files with CI-produced tag
- Sync ArgoCD on indri: `argocd app sync kube-state-metrics`
- Sync ArgoCD on ringtail: `argocd app sync kube-state-metrics --context=k3s-ringtail` (note: argocd uses its own auth, not kubectl context)
- Verify metrics still flowing to Prometheus
Reviewed-on: #327
Worker forks 4 Dramatiq processes each loading the full Django app
(~250MB each), hitting the 1Gi limit on startup. Ringtail has ample
RAM headroom.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
teslamate had superuser on the shared blumeops-pg cluster (which also
hosts miniflux and authentik). Downgraded to plain database owner with
extension ownership (cube, earthdistance) transferred manually so it
can still ALTER EXTENSION UPDATE. earthdistance is untrusted in PG so
DROP+CREATE would need temporary superuser escalation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Patch upgrade with bug fixes (diff normalization, installation ID cache).
Pin the upstream manifest URL to commit SHA for supply chain integrity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves 4 unmuted Prowler core_seccomp_profile_docker_default
findings on alloy, immich-server, immich-machine-learning, and
immich-valkey.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>