Commit graph

640 commits

Author SHA1 Message Date
9d1e7eff12 C2(jobsync): close — deploy-jobsync
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 10:57:02 -07:00
c42d219cb6 C2(jobsync): impl — update image tag to v1.1.4-e51ec83-nix
Container with /usr/bin/env symlink for npx shebang resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:45:49 -07:00
e51ec83c41 C2(jobsync): impl — add /usr/bin/env symlink for npx-installed scripts
npx-downloaded prisma has `#!/usr/bin/env node` shebang. Nix containers
lack FHS paths; create the symlink in extraCommands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:44:36 -07:00
1124b07870 C2(jobsync): impl — update image tag to v1.1.4-6b36d53-nix
Simplified container with npx -y for prisma migrations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:40:21 -07:00
6b36d53bab C2(jobsync): impl — simplify: use npx -y for runtime prisma migrate
Instead of bundling prisma CLI and its deep dependency tree in the nix
image, use `npx -y prisma@6.19.0 migrate deploy` like upstream does.
npx downloads prisma at container startup — network is available at
runtime, only blocked during nix build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:26:13 -07:00
87930e5e31 C2(jobsync): impl — skip npm prune, copy prisma transitive deps
Prisma CLI (devDep) has a deep transitive dependency tree that must
be present at runtime for `migrate deploy`. Skip npm prune entirely
and explicitly copy all prisma packages and their transitive deps
into the output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:23:28 -07:00
2384a15cbf C2(jobsync): impl — update image tag to v1.1.4-fdac2e3-nix
Container with /tmp directory fix for prisma get-platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:14:39 -07:00
fdac2e3699 C2(jobsync): impl — add /tmp and /data directories to nix container
Prisma's get-platform module requires /tmp for temp files. Nix
containers don't create standard directories by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:10:11 -07:00
6187ade18f C2(jobsync): impl — update image tag to v1.1.4-846d879-nix
Container with fixed @prisma/engines copy and local prisma binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:06:16 -07:00
846d879b40 C2(jobsync): impl — fix @prisma/engines copy in installPhase
The cp -r of @prisma/ into an existing node_modules/@prisma/ nested
incorrectly. Use cp -rn with glob to merge contents instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:00:24 -07:00
8d272eee83 C2(jobsync): impl — update image tag to v1.1.4-27039e7-nix
Point kustomization at the container with the fixed entrypoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 08:46:05 -07:00
27039e7fe7 C2(jobsync): impl — fix entrypoint to use local prisma binary
npx is not available in the nix container. Call prisma directly via
node node_modules/prisma/build/index.js instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 08:39:42 -07:00
2220944a15 C2(jobsync): impl — ArgoCD app, k8s manifests, Caddy route
ArgoCD Application targeting ringtail k3s cluster.
Manifests: Deployment, Service, Tailscale Ingress, PVC (local-path),
ExternalSecret (1Password auth_secret + encryption_key).
Caddy route: jobsync.ops.eblu.me -> jobsync.tail8d86e.ts.net.
Ollama integration via OLLAMA_BASE_URL env var in deployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 08:33:13 -07:00
5b71bb2398 C2(jobsync): close build-jobsync-container, integrate-jobsync-ollama
build-jobsync-container: Updated with lessons learned (prisma-engines
from nixpkgs, Google Fonts sandbox workaround, arm64 vs x86_64).
integrate-jobsync-ollama: Configuration-only card, env var will be
set in the deployment manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 08:30:51 -07:00
b1616bc96b C2(jobsync): impl — fix nix derivation for ringtail build
Use nixpkgs prisma-engines to avoid network downloads in nix sandbox.
Patch out Google Fonts import (Inter) since sandbox blocks network;
falls back to system sans-serif font stack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:28:31 -08:00
874a967000 C2(jobsync): impl — nix container derivation and entrypoint
Add containers/jobsync/default.nix (buildNpmPackage + dockerTools) and
entrypoint.sh (prisma migrate + node server.js). Hashes are empty
placeholders — will be filled from first build attempt on ringtail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:08:38 -08:00
92f0b190b8 C2(jobsync): close mirror-jobsync
Mirror already exists at forge.ops.eblu.me/mirrors/jobsync from
previous cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:00:57 -08:00
60507ee719 C2(jobsync): plan — update cards with learnings from first attempt
build-jobsync-container: document prisma devDep pruning pitfall,
nix entrypoint path issue, and verification step.

deploy-jobsync: document service-versions.yaml requirement,
image tag format, and 1Password item already created.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:00:03 -08:00
15ceeb5f9d C2(jobsync): plan — Mikado cards for JobSync deployment
Cards:
- deploy-jobsync (goal): Deploy JobSync to ringtail k3s via ArgoCD
- build-jobsync-container: Nix container build (buildLayeredImage)
- mirror-jobsync: Mirror upstream to forge
- integrate-jobsync-ollama: Wire up existing Ollama for AI features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 20:43:14 -08:00
1c3bf35dad Fix mikado invariant check rejecting close without impl
A close commit with zero preceding impl commits is valid — some leaf
nodes involve operational steps (e.g., creating a mirror) with no code
changes. Removed the false-positive check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 20:41:03 -08:00
14e931591b Fix 1Password Connect numeric log levels misclassified in Grafana (#287)
## Summary
- 1Password Connect uses non-standard numeric log levels (`1`=error, `2`=warn, `3`=info, `4`=debug, `5`=trace) per [1Password/connect#44](https://github.com/1Password/connect/issues/44)
- Alloy extracts the `level` JSON field as-is, so info-level health checks get `level="3"` in Loki
- Grafana expects string level labels — numeric values are unrecognized, causing misclassified log severity/coloring
- Adds a `stage.match` + `stage.template` in the Alloy pipeline scoped to `{namespace="1password"}` to normalize numeric levels to standard strings
- Other services are completely unaffected (scoped by namespace, not global)

## Deployment and Testing
- [ ] Sync alloy-k8s from branch: `argocd app set alloy-k8s --revision fix/onepassword-numeric-log-levels && argocd app sync alloy-k8s`
- [ ] Wait ~2 minutes for new logs to flow
- [ ] Verify level labels: `curl -sG "http://localhost:3100/loki/api/v1/label/level/values" --data-urlencode 'query={namespace="1password"}'` should show `"info"` and `"warn"` instead of `"3"` and `"2"`
- [ ] Check Grafana log panel for 1password namespace — logs should no longer appear as errors
- [ ] After merge: `argocd app set alloy-k8s --revision main && argocd app sync alloy-k8s`

Reviewed-on: #287
2026-03-07 13:57:04 -08:00
d3f9699c41 Review cv and docs services — both healthy, no upgrades needed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:10:16 -08:00
0e09521ce3 Review manage-flyio-proxy.md — no issues found
Add last-reviewed date. Content is accurate and complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:46 -08:00
6a033d55be Review and update review-services.md
- Add last-reviewed date
- Align service type sections with actual types (argocd/ansible/nixos)
- Remove nonexistent "Helm Chart" and "Hybrid" sections
- Fold custom container guidance into ArgoCD section
- Reference kustomization.yaml for image tags instead of Helm charts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:08 -08:00
e47a3b2ebb Review and update review-documentation.md
- Add last-reviewed date
- Replace raw pulumi commands with mise task equivalents
- Reference C0/C1/C2 change classification for making changes
- Note that prek handles link validation automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:59:51 -08:00
590cb1d25d Document required preview directory for Frigate NFS volume
Frigate 0.17 does not auto-create clips/previews/<camera>/, causing
review page previews to silently fail with 500 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:46:23 -08:00
Forgejo Actions
2809ba6f50 Update docs release to v1.13.3
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 20:49:01 -08:00
55013db124 Add changelog fragment for Dagger v0.20.1 upgrade v1.13.3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:42:00 -08:00
b793299d6d Upgrade Dagger engine from v0.20.0 to v0.20.1
Phase 2 of Dagger upgrade: bump engine version, update runner
deployment to v0.20.1-24f7512, and fix docs reference card version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:41:02 -08:00
24f7512d59 Bump runner-job-image Dagger CLI from 0.20.0 to 0.20.1
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Successful in 2m28s
Phase 1 of Dagger upgrade: update the CLI in the runner container first
so CI can build the new image with the old engine version. See
[[upgrade-dagger]] for the full procedure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:32:05 -08:00
ba7236ade0 Add how-to guide for upgrading Dagger
Documents the correct two-phase upgrade procedure to avoid the
chicken-and-egg problem where CI can't build its own replacement.
Also fixes outdated version references in the Dagger reference card.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:31:30 -08:00
Forgejo Actions
e95fb9a555 Update docs release to v1.13.2
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 19:03:24 -08:00
a7c21bd8a6 Update docs quartz container to v1.28.2-b64010b v1.13.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:58:40 -08:00
b64010b3c7 Replace spider-trap nginx 404s with robots.txt disallowing /explorer/
All checks were successful
Build Container (Nix) / detect (push) Successful in 3s
Build Container / detect (push) Successful in 3s
Build Container (Nix) / build (quartz) (push) Successful in 2s
Build Container / build (quartz) (push) Successful in 9s
The /explorer/ SPA endpoints were the source of all spider-trap traffic.
A robots.txt Disallow is a better fix than serving 404s — it prevents
crawlers from entering the infinite URL tree in the first place, avoids
serving large numbers of 404s that hurt SEO, and doesn't break legitimate
deep links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:34:37 -08:00
Forgejo Actions
8b0ff3d7a5 Update docs release to v1.13.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-06 10:00:42 -08:00
1537412c09 Update docs quartz container to v1.28.2-6636576 v1.13.1
Picks up spider-trap nginx guards from 6636576.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:52:31 -08:00
6636576cdc Add spider-trap guards to docs.eblu.me Quartz nginx config
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 12s
Block recursive crawler paths caused by SPA fallback + relative links:
/tags/ depth >1 returns 404, global depth ≥5 returns 404.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:43:41 -08:00
6e8d11c6bb Add :kustomized sentinel tag to manifest images, review devpi
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 08:15:06 -08:00
2ac1a1abc2 Update ringtail flake inputs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 07:54:51 -08:00
6d84fcfb05 Review how-to index: strip prose, add last-reviewed
Removed descriptions, table formatting, and Mikado chain commentary
from the how-to index — it should be links only. Added last-reviewed
date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 07:52:06 -08:00
Forgejo Actions
d98ef984ea Update docs release to v1.13.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-05 11:11:38 -08:00
46cc3fbc2e Update forgejo-runner job image to v0.20.0-448689b v1.13.0
Built locally to break the chicken-and-egg: the old runner couldn't
build its own replacement because it needed Dagger 0.20.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 11:05:21 -08:00
448689bf2a Bump runner-job-image Dagger CLI from 0.19.11 to 0.20.0
Some checks failed
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 1s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Failing after 2s
The Dagger module was upgraded to v0.20.0 in d15071a but the runner job
image still had the old CLI, causing build-blumeops to fail with a
version mismatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:58:38 -08:00
c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
d15071aaf9 Upgrade Dagger from v0.19.11 to v0.20.0 (#285)
## Summary
- Bump Dagger engine version from v0.19.11 to v0.20.0 in `dagger.json`
- Pin dagger CLI to `0.20.0` in `mise.toml` (was `"latest"`)
- Regenerated `.dagger/uv.lock` (new SDK deps: httpcore, beartype bump)

## Testing
- [x] `dagger call validate-workflows --src=.` passes on v0.20.0
- [ ] CI build workflow passes

Reviewed-on: #285
2026-03-05 09:32:13 -08:00
7bddc78c8a Add ExternalSecret default fields to prevent ArgoCD drift
The external-secrets operator adds conversionStrategy, decodingStrategy,
and metadataPolicy defaults to the live object, causing perpetual
OutOfSync in ArgoCD. Declare them explicitly to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 09:11:23 -08:00
405fc59c12 Add Authentik OIDC login for ArgoCD (#284)
## Summary
- Add Authentik OAuth2 provider + application blueprint for ArgoCD (ringtail side)
- Add OIDC config to ArgoCD ConfigMap with Authentik as identity provider (indri side)
- Map Authentik `admins` group to ArgoCD `role:admin` via RBAC policy
- ExternalSecrets on both sides pull `argocd-client-secret` from 1Password
- Local admin password remains as break-glass — both login methods coexist

## Pre-deployment manual step
Add `argocd-client-secret` field to "Authentik (blumeops)" in 1Password with a random value (e.g., `openssl rand -hex 32`).

## Deployment order
1. Sync Authentik app on ringtail first (blueprint + secret + worker env var)
2. Sync ArgoCD app on indri second (cm, rbac, ExternalSecret)

## Verification
- [ ] `argocd-client-secret` field added to 1Password
- [ ] Authentik app synced on ringtail — blueprint applied, provider created
- [ ] ArgoCD app synced on indri — OIDC config applied
- [ ] SSO login works: visit `https://argocd.ops.eblu.me` → "Log in via Authentik" → admin access
- [ ] Break-glass: local admin/password login still works

Reviewed-on: #284
2026-03-05 09:07:25 -08:00
c029e5851a Review migrate-forgejo-from-brew doc, fix stale Phase 3 reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:29:58 -08:00
91c755ddd6 Pin kiwix-serve image tag to v3.8.2-f6f0f79
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:17:40 -08:00
92364f7305 Remove suggestion to run prek manually from README
Hooks run automatically on git commit; no need to invoke separately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:15:25 -08:00