Replace Dockerfile with container.py for native Dagger builds.
Bump devpi-server 6.19.1→6.19.3, devpi-web 5.0.1→5.0.2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the hand-written Dockerfile with container.py using the shared
alpine_runtime helper, which bumps the base image from Alpine 3.22 to 3.23.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dockerfiles with native container.py for both transmission and
transmission-exporter. Updates base images (Alpine 3.23, Python 3.14),
pins uv to 0.11.6 instead of :latest.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LaunchAgents now call borgmatic directly at its mise-installed path
instead of routing through `mise x`, which triggered macOS TCC
permission dialogs (e.g. "mise wants to access Documents") that hung
headless sessions and caused backup failures.
Also adds `mise install` to the ansible role so borgmatic installation
is fully managed, and pins the version in both mise.toml and the role
defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released
The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.
## Test plan
- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #336
## Summary
- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note
## Test plan
- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`
Reviewed-on: #332
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers
## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.
## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```
## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me
## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.
Reviewed-on: #330
Add flyio-tailscale (v1.94.1), flyio-nginx (1.29.6-alpine), and
flyio-alloy (v1.14.1) entries with new `fly` service type so future
upgrades go through the service-review workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug-fix release with web UI fixes, LDAP page size, and SAML SLO
redirect. Also bumps client-go to v3.2026.2.1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Patch upgrade with bug fixes (diff normalization, installation ID cache).
Pin the upstream manifest URL to commit SHA for supply chain integrity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the Helm chart deployment with plain kustomize manifests following
the Authentik pattern (separate deployments per component). Consolidate
the immich-storage ArgoCD app into the main immich app. Add no-helm-policy
doc establishing kustomize as the standard deployment mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Add `containers/tempo/Dockerfile` — two-stage Go build from forge mirror, modeled on loki
- Switch kustomization from upstream `grafana/tempo` to `registry.ops.eblu.me/blumeops/tempo`
- Bump Tempo 2.10.1 → 2.10.3
## Test plan
- [ ] Kick off container build via `mise run container-build-and-release tempo`
- [ ] Update kustomization `newTag` with built image tag
- [ ] Deploy from branch: `argocd app set tempo --revision local-tempo-container && argocd app sync tempo`
- [ ] Verify Tempo health: `curl tempo.ops.eblu.me/ready`
- [ ] Verify traces flowing in Grafana Tempo datasource
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #323
## Summary
- Add `nixpkgs-services` flake input pinned to a specific nixpkgs commit, with an overlay that pulls `forgejo-runner`, `snowflake`, and `k3s` from it instead of the rolling `nixpkgs`
- Dagger `flake-update` pipeline now excludes `nixpkgs-services` via `--exclude`
- Fix stale nix-container-builder version in service-versions.yaml (was 12.6.4, actually running 12.7.2)
- Add k3s and minikube to service-versions.yaml tracking
- Document the pinning approach in review-services how-to and ringtail reference
## Motivation
During service review, discovered that flake updates had silently upgraded forgejo-runner from 12.6.4 → 12.7.2 without updating service-versions.yaml. This "sneak-in upgrade" bypasses the service review process. The overlay ensures these three services only change versions deliberately.
## Test plan
- [ ] Verify `nix flake update` from `nixos/ringtail/` does not change `nixpkgs-services` lock entry
- [ ] Verify `mise run provision-ringtail` builds successfully with the overlay
- [ ] Confirm running service versions unchanged after deploy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #321
Patch upgrade picks up idempotent FetchTask API, offline registration
fix, cloudflare/circl security dep update, and custom gRPC user-agent.
No config defaults changed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix stale CV service doc (URL, forge domain, container tag) and add
guidance for reviewing build-time dependencies in private forge repos
during service reviews.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minor release with new widgets (Tracearr, SparklyFitness), Seerr rename,
and dependency bumps. No breaking changes for our config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade External Secrets Operator from v1.3.2 (helm-chart-2.0.0) to v2.2.0
- Migrate from Helm chart deployment to static kustomize manifests, matching the repo's kustomize-first pattern
- Merge separate `-config` ArgoCD apps into the main operator apps (6 → 4 apps)
- Clean up Helm-specific labels (`helm.sh/chart`, `managed-by: Helm`)
- Update README example from v1beta1 to v1 API
## Breaking changes assessment
Low risk — v2.0.0 removed Alibaba and Device42 providers (we use neither). No templating changes affect us. All ExternalSecrets already use v1 API.
## Deployment steps
1. Sync CRDs first on both clusters (new CRD version)
2. Sync operator apps (now kustomize-based)
3. Verify ClusterSecretStore and all ExternalSecrets are healthy
4. Delete orphaned config apps: `argocd app delete external-secrets-config` and `-config-ringtail`
5. `mise run services-check`
Reviewed-on: #312
## Summary
- Add Snowflake proxy as a native systemd service on ringtail (NixOS)
- Uses `pkgs.snowflake` from nixpkgs (v2.11.0)
- Hardened systemd unit with DynamicUser, ProtectSystem=strict, 512MB memory limit
- Prometheus metrics enabled on localhost:9999
## What is Snowflake?
A Tor pluggable transport that helps censored users reach the Tor network via WebRTC. **This is NOT a Tor exit node** — traffic exits through Tor exit nodes operated by others. The proxy operator cannot see traffic content (double-encrypted) and destination servers never see the proxy's IP.
## Changes
- `nixos/ringtail/configuration.nix` — new systemd service definition
- `docs/reference/services/snowflake-proxy.md` — service reference card
- `docs/reference/infrastructure/ringtail.md` — updated systemd services section
- `service-versions.yaml` — added entry (type: nixos)
## Deploy plan
After review, deploy via `mise run provision-ringtail`. Service starts automatically.
## Test plan
- [ ] `mise run provision-ringtail` succeeds
- [ ] `ssh ringtail 'systemctl status snowflake-proxy'` shows active
- [ ] `ssh ringtail 'journalctl -u snowflake-proxy --no-pager -n 20'` shows broker connections
- [ ] `ssh ringtail 'curl -s localhost:9999/metrics'` returns Prometheus metrics
Reviewed-on: #311
## Summary
- Replace upstream `docker.io/library/redis:7-alpine` (Redis 7.4.8) with a nix-built container using Redis 8.2.3 from nixpkgs
- Introduce **attached service pattern**: `parent` field in service-versions.yaml, `<parent>-<component>` naming convention, and `assert pkgs.redis.version == version` in default.nix to prevent silent version drift on `flake.lock` updates
- Document the pattern in [[review-services]] so future attached services slot in cleanly
- Backfill `parent: grafana` on existing `grafana-sidecar` entry
## Version drift protection
1. `flake.lock` update bumps nixpkgs redis → `assert` in `default.nix` breaks `nix-build`
2. Developer updates `version` in `default.nix` → prek's `container-version-check` demands matching `service-versions.yaml` update
3. Both must agree before commit succeeds
## Test plan
- [ ] Build container from branch on ringtail (`mise run container-build-and-release authentik-redis`)
- [ ] Update kustomization `newTag` to branch-built image tag
- [ ] Sync authentik ArgoCD app from branch (`argocd app set authentik --revision localize-redis && argocd app sync authentik`)
- [ ] Verify Authentik login, session persistence, and task queue still work
- [ ] After merge: C0 follow-up to update `newTag` to the main-built image tag
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #309
Bump from RC to latest stable (security fixes for config endpoint and
cross-camera auth). Add new 0.17 motion retention tier at 365 days,
reduce continuous from 180 to 30 days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v1.96.3 exists as a GitHub release but Docker Hub images for both
tailscale/tailscale and tailscale/k8s-operator haven't been published
yet (v1.94.2 is still latest). Revert the image tags; the fly/start.sh
`tailscale wait` improvement and review date stamps are retained.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Bump Tailscale operator, proxy containers, and init containers from v1.94.2 to v1.96.3 across both clusters (indri + ringtail via shared base kustomization)
- Replace hand-rolled `until tailscale status` polling loop in `fly/start.sh` with `tailscale wait --timeout 60s` (new in v1.96.2)
- Stamp kube-state-metrics review date (already current at v2.18.0)
## Notable upstream changes (v1.94.2 → v1.96.3)
- Go upgraded from 1.25 to 1.26
- `tailscale wait` command — blocks until daemon is running + interface has IP
- AuthKey policy now applies only when users are not logged in (behavioral change)
- Peer Relay improvements (metrics, EC2 IMDS, UDP socket scaling)
- UPnP stability fixes
## Deploy plan
1. Merge PR
2. Sync tailscale-operator on indri: `argocd app sync tailscale-operator`
3. Sync tailscale-operator on ringtail: `argocd app sync tailscale-operator-ringtail --server ringtail...`
4. Verify proxy pods roll with new image: `kubectl --context=minikube-indri -n tailscale get pods`
5. Verify ingress connectivity (spot-check a few `*.tail8d86e.ts.net` services)
6. Rebuild + deploy Fly proxy container (separate step, picks up `tailscale wait` change)
## Test plan
- [ ] ArgoCD diff looks clean for both apps before sync
- [ ] Proxy pods on indri come up healthy with v1.96.3 images
- [ ] Proxy pods on ringtail come up healthy with v1.96.3 images
- [ ] Tailscale ingress services remain reachable (e.g., grafana, prometheus)
- [ ] Fly proxy rebuild deploys successfully with `tailscale wait`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #304
## Summary
- Deploy UnPoller as a k8s service on indri to export UniFi controller metrics to Prometheus
- Custom-built container from forge mirror (`containers/unpoller/Dockerfile`)
- Credentials pulled from 1Password via external-secrets
- Prometheus scrape job added, docs and service-versions updated
## Test plan
- [ ] Build container: `mise run container-release unpoller v2.34.0`
- [ ] Update kustomization tag with built image tag
- [ ] Deploy from branch: `argocd app set unpoller --revision feature/unpoller && argocd app sync unpoller`
- [ ] Verify pod connects to UX7 controller (check logs)
- [ ] Confirm `unpoller` target appears in Prometheus
- [ ] Query `unifi_` metrics in Grafana
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #298
## Summary
- Upgraded borgmatic from 2.0.13 to 2.1.3 on indri (via mise/pipx)
- Key changes: improved borg warning handling, memory/performance improvements, `source_directories_must_exist` now defaults to true (already set in our config)
- Verified: config validates, dry-run passed against both sifaka (local) and borgbase (offsite) repos
## Borg Warnings Investigation
The main concern was 2.1.0's change to treat borg warnings as errors. In 2.1.3 this was partially reverted — "file not found" warnings (exit code 107) are back to being warnings. Our config already sets `source_directories_must_exist: true`, and all four source directories were verified present on indri.
## Test plan
- [x] `borgmatic --version` confirms 2.1.3
- [x] `borgmatic config validate` passes
- [x] `borgmatic create --dry-run` succeeds against both repositories
- [x] All source directories verified present on indri
- [ ] Verify next scheduled backup (2:00 AM) completes successfully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #297
## Summary
- Upgrade zot OCI registry from v2.1.13 to v2.1.15 on indri
- Addresses CVE-2025-30204 (golang-jwt memory) and open redirect via callback_ui
- No config template changes needed (externalUrl is auto-allowlisted)
- Requires Go 1.25.7 (bump from 1.25.6 via mise)
## Data Safety
- Data directory ~/erichblume/zot is NOT touched during build or deploy
- No schema migrations in v2.1.14 or v2.1.15
- Storage format remains OCI spec 1.1.0
## Deployment Steps
- [ ] SSH to indri: bump Go to 1.25.7 via `mise use go@1.25.7`
- [ ] Fetch and checkout v2.1.15 in ~/code/3rd/zot
- [ ] Build: `mise x -- make binary`
- [ ] Restart LaunchAgent
- [ ] Verify: `curl -s http://localhost:5050/v2/` returns 200
- [ ] Verify: `curl -s https://registry.ops.eblu.me/v2/_catalog` lists repos
- [ ] Verify: `mise run services-check`
Reviewed-on: #293
Mosquitto has been dormant since frigate-notify switched from MQTT to
webapi polling (529ba10). Tear down live infra (ArgoCD app, namespace)
and remove all manifests, service-versions entry, services-check, and
doc references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add jobsync pod check (ringtail k3s) and HTTP endpoint to `services-check`
- Add JobSync entry to homepage dashboard under new "Apps" group
- Mark jobsync as reviewed at v1.1.4 (current with upstream)
- Changelog fragment added
## Deployment and Testing
- [ ] Sync homepage app from branch: `argocd app set homepage --revision review/jobsync && argocd app sync homepage`
- [ ] Verify JobSync appears on go.ops.eblu.me dashboard
- [ ] Run `mise run services-check` to verify new checks pass
- [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage`
Reviewed-on: #291
## Summary
C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.
### Mikado Graph
```
deploy-jobsync (goal)
├── build-jobsync-container
│ └── mirror-jobsync
└── integrate-jobsync-ollama
```
### What is JobSync?
Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.
### Key Decisions
- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed
### Current Status
Planning phase — cards committed, ready for review before implementation begins.
Reviewed-on: #288
Phase 1 of Dagger upgrade: update the CLI in the runner container first
so CI can build the new image with the old engine version. See
[[upgrade-dagger]] for the full procedure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>