Commit graph

21 commits

Author SHA1 Message Date
c86b5d7772 Native Dagger container builds + Navidrome v0.61.1 (#330)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (navidrome) (push) Successful in 22m26s
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers

## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.

## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```

## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me

## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.

Reviewed-on: #330
2026-04-11 17:11:56 -07:00
07e9c810ca Add RuntimeDefault seccomp profiles to all managed workloads
Addresses 32 CIS Kubernetes Benchmark failures from Prowler scan
(core_seccomp_profile_docker_default). Applied pod-level seccomp
RuntimeDefault to 18 deployments/statefulsets and 2 cronjobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:19:40 -07:00
3d2a97aaf9 Update kustomization tags to OCI-labeled builds (613f05d)
Point all services at the 613f05d images which carry the new
consistent OCI labels. Skipped kiwix/transmission (old v4.0.6-r4
version, no matching build) and docs/quartz (no 613f05d build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:34:12 -07:00
96d0f668fd Reorganize Homepage groups: add Home, move Grafana to Infrastructure
Move NVR, Jellyfin, and DJ to new Home group. Move Grafana from Content
to Infrastructure. Switch all layout groups from column to row style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 06:15:52 -07:00
6e8d11c6bb Add :kustomized sentinel tag to manifest images, review devpi
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 08:15:06 -08:00
e0f9ebebdf Update homepage, navidrome, ntfy, miniflux image tags after mirror migration
Prometheus and teslamate builds still in progress — will update in a
follow-up commit once their 33b7f0f tags land.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 21:06:08 -08:00
9b44a8ec51 Add kustomize images: and configMapGenerator: across services (#264)
## Summary

- Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service
- Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes
- Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical)

### Services modified

**Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent

**Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus

**Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is)

**Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml)

### What changes at deploy time

- **images:** — no functional diff, `kustomize build` produces identical output with tags
- **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change

## Test plan

- [x] `kubectl kustomize` builds all 30 service directories successfully
- [x] Image tags verified in rendered output for all modified services
- [x] ConfigMap hash suffixes verified in rendered output
- [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names
- [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.)
- [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes
- [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
e1c2892878 Fix container tags deleted during old-tag cleanup
Five container manifests were removed when deleting old-style tags
(shared digests). Rebuild on a72a0d8 and update references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:26:29 -08:00
a72a0d8e8e Update all container images to new upstream-version tagging scheme (#238)
## Summary
- Updates all 15 container image references across 14 ArgoCD manifest files
- Migrates from old internal `vX.Y.Z` tags to new `v<upstream-version>-<sha>` format
- Covers: authentik, cv, devpi, forgejo-runner, homepage, kiwix-serve, kubectl, miniflux, navidrome, ntfy, quartz, teslamate, transmission

## Deployment and Testing
- [ ] Sync all ArgoCD apps on branch revision
- [ ] Verify all services come up healthy
- [ ] Merge and re-sync on main
- [ ] Clean up old-style tags from zot registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/238
2026-02-21 15:58:11 -08:00
74294094e3 Fix navidrome custom container image v1.0.2 (#194)
## Summary
- Switch navidrome deployment from upstream `deluan/navidrome:0.60.3` back to custom image `registry.ops.eblu.me/blumeops/navidrome:v1.0.2`
- The v1.0.1 image was tagged before the `USER 65534` removal commit, so it still ran as a non-root user that couldn't write to the SQLite data directory
- v1.0.2 is built from current main which includes both the `zlib-dev` build fix and the non-root user removal

## Deployment and Testing
- [ ] Wait for CI to build `navidrome:v1.0.2` image
- [ ] Sync via ArgoCD and verify pod starts without CrashLoopBackOff
- [ ] Verify navidrome UI accessible at https://navidrome.ops.eblu.me

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/194
2026-02-16 08:24:33 -08:00
6c41338b36 Revert navidrome to upstream image pending container fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 08:22:04 -08:00
accbb80683 Update navidrome image to v1.0.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 08:18:17 -08:00
996441876d Document container build pattern and port navidrome (#192)
Some checks failed
Build Container / build (push) Failing after 4m28s
## Summary
- Add how-to guide (`docs/how-to/build-container-image.md`) covering the full container build workflow: directory layout, Dagger local builds, mise release task, and common patterns with links to existing containers
- Port navidrome from upstream `deluan/navidrome:0.60.3` to a custom three-stage build (`containers/navidrome/Dockerfile`) using Node + Go + Alpine
- Update navidrome deployment to use `registry.ops.eblu.me/blumeops/navidrome:v1.0.0`

## Deployment and Testing
- [x] `dagger call build --src=. --container-name=navidrome` builds successfully
- [ ] After merge: `mise run container-tag-and-release navidrome v1.0.0`
- [ ] After image published: `argocd app sync navidrome` and verify pod starts

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/192
2026-02-15 08:05:11 -08:00
8f4708e26f Fix navidrome image tag: remove v prefix (0.60.3 not v0.60.3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:23:12 -08:00
b3747f6c95 Tier 1 version bumps (#186)
All checks were successful
Build Container / build (push) Successful in 8s
## Summary

Audit and upgrade of all deployed images, helm charts, and custom container Dockerfiles to latest stable versions. This PR covers Tier 1 (low-risk minor/patch bumps only).

### Upstream images
| Image | Old | New |
|-------|-----|-----|
| kube-state-metrics | v2.13.0 | v2.18.0 |
| prometheus | v3.2.1 | v3.9.1 |
| loki | 3.3.2 | 3.6.5 |
| alloy | v1.5.1 | v1.13.1 |
| tailscale (proxy + operator) | v1.92.5 | v1.94.1 |
| navidrome | :latest | v0.60.3 (pinned) |

### Helm charts
| Chart | Old | New |
|-------|-----|-----|
| CloudNativePG | v0.27.0 | v0.27.1 |
| 1Password Connect | 2.2.1 | 2.3.0 |

### Custom containers (Dockerfiles updated, images not yet tagged)
| Container | Changes | New tag |
|-----------|---------|---------|
| miniflux | 2.2.16→2.2.17 (security), alpine 3.22 | v1.1.0 |
| kubectl | v1.34.1→v1.34.4, alpine 3.22 | v1.1.0 |
| kiwix-serve | alpine 3.22 | v1.1.0 |
| nettest | alpine 3.22 | v0.14.0 |
| transmission | alpine 3.22, pkg 4.0.6-r4 | v1.1.0 |

All custom containers verified with local `dagger call build`.

### Deferred to Tier 2 (separate PRs)
- Forgejo runner 6→12 (major version scheme change)
- Docker DinD 27→29
- Grafana chart 8→11 (repo migration)
- External Secrets 1→2 (breaking changes)
- Python 3.12→3.13, Elixir 1.18→1.19, Node 22→24
- Transmission 4.0.6→4.1.0 (not in Alpine yet)

## Deployment

After merge:
1. Tag custom containers: `mise run container-tag-and-release <name> <version>` for each
2. Wait for CI builds to complete
3. `argocd app sync apps` then sync individual apps, or let ArgoCD auto-detect

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/186
2026-02-13 17:16:37 -08:00
48ce5b4120 Recategorize homepage into Content and Misc groups (#179)
## Summary
- Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups
- **Content**: Immich, Kiwix, Miniflux, DJ, Grafana
- **Misc**: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI

## Deployment and Testing
- [ ] Sync affected ingresses via ArgoCD (all 11 services)
- [ ] Verify homepage shows the two new groups correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179
2026-02-13 09:09:22 -08:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
1e13d4b83d Fix Navidrome automatic library scan schedule (#101)
## Summary
- Fix env var name from `ND_SCANSCHEDULE` to `ND_SCANNER_SCHEDULE` (Navidrome uses viper config where dots become underscores)
- Use explicit `@every 1h` format for clarity
- Reorder CLAUDE.md rules to emphasize running zk-docs first

## Root Cause
Navidrome logs showed "Periodic scan is DISABLED" at startup despite the env var being set. The config key is `scanner.schedule`, which translates to `ND_SCANNER_SCHEDULE` (not `ND_SCANSCHEDULE`).

## Deployment and Testing
- [ ] Sync navidrome app: `argocd app sync navidrome`
- [ ] Verify pod restarts with new env var
- [ ] Check logs for "Scheduling scanner" message instead of "Periodic scan is DISABLED"
- [ ] Wait ~1 hour and confirm scan runs automatically

🤖 Generated with [Claude Code](https://claude.ai/code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/101
2026-02-04 07:23:12 -08:00
4d97ac4c26 Expand homepage widgets and info panels (#81)
## Summary
- Add greeting and datetime info widgets to homepage header
- Add Miniflux widget showing unread/read counts (via existing API key in 1Password)
- Add Grafana widget showing dashboards/datasources/alerts (via existing credentials in 1Password)
- Add ArgoCD to bookmarks section
- Add TODO comments for widgets needing additional setup (Forgejo, Caddy, UniFi, Glances, Navidrome, Transmission, Immich)

## Deployment and Testing
- [ ] Sync homepage app to deploy new ExternalSecrets
- [ ] Verify greeting and datetime appear in header
- [ ] Verify Miniflux widget shows unread/read counts
- [ ] Verify Grafana widget shows dashboard stats
- [ ] Check that services without credentials still display (just without widgets)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/81
2026-02-02 16:11:20 -08:00
fc0ce955e4 Move DJ to Apps group on Homepage (#80)
## Summary
- Change navidrome homepage group from Media to Apps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/80
2026-01-31 20:29:43 -08:00
ade21cc49e Add Navidrome music streaming server (#79)
## Summary
- Deploy Navidrome music streaming server to k8s
- NFS mount for music library from sifaka:/volume1/music (read-only)
- Local PVC for SQLite database and config (10Gi)
- Tailscale ingress for dj.tail8d86e.ts.net
- Caddy reverse proxy for dj.ops.eblu.me
- Homepage annotations for dashboard discovery in Media group

## Deployment and Testing
- [ ] Sync `apps` application to pick up new Application definition
- [ ] Set navidrome app to feature branch and sync
- [ ] Verify NFS mount with `kubectl exec`
- [ ] Provision Caddy for dj.ops.eblu.me
- [ ] Access https://dj.ops.eblu.me and create initial admin user
- [ ] Verify Homepage shows DJ in Media group
- [ ] Reset to main and resync after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/79
2026-01-31 20:19:31 -08:00