Commit graph

162 commits

Author SHA1 Message Date
9bafe85b2b Add teslamate extensions to DR restore procedure
The earthdistance extension (depends on cube) must be created before
restoring the teslamate database — discovered missing after 2026-04-13 DR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:12:26 -07:00
fe0e913963 Switch Fly proxy to upstream keepalive pools (#337)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
## Summary

- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)

## Trade-off

DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.

## Still TODO on this branch

- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #337
2026-04-17 16:39:52 -07:00
3ecd888537 Switch container builds to manual-only workflow dispatch
Shared Dagger helpers (src/blumeops/) affect all Dagger-built containers,
making path-based auto-triggers unreliable. All builds now go through
`mise run container-build-and-release <name>`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:25:14 -07:00
223b134776 Document uv.lock as the source of devpi dependency in Dagger builds
The lockfile bakes in devpi URLs — Dagger does a locked install, not
fresh resolution. This is the mechanism behind the cold-cache failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:41:45 -07:00
ccaef4c1a7 Document devpi cold cache failure mode and deploy teslamate v3.0.0-08c698e
After a DR rebuild, devpi's empty cache causes race conditions under
concurrent load — metadata is served but wheel files 404. Also deploys
the first container.py-built teslamate image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:38:06 -07:00
08c698e833 Migrate teslamate to native Dagger container.py (#333)
Some checks failed
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (teslamate) (push) Failing after 6s
## Summary
- Replace legacy Dockerfile with native Dagger `container.py` build
- Two-stage pipeline: Elixir+Node builder, Debian slim runtime
- Uses shared helpers (`clone_from_forge`, `oci_labels`)
- Delete old Dockerfile (pipeline auto-discovers container.py)
- Update build-container-image docs and mark service reviewed

## Test plan
- [x] `dagger call build --src=. --container-name=teslamate` succeeds locally
- [ ] CI container build passes
- [ ] Deploy from branch and verify teslamate starts cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #333
2026-04-14 07:20:52 -07:00
4ca0630d76 Review enforce-tag-immutability doc: add review date and zot reference link
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:00:55 -07:00
d7c3c687f4 Document DR rebuild procedure and update restart-indri
- New how-to: rebuild-minikube-cluster with full bootstrap procedure
  validated during 2026-04-13 DR event
- Update restart-indri: warn about minikube delete, macOS permission
  dialog on first Tailscale SSH, forgejo_actions_secrets dep cycle
- Update disaster-recovery reference: link to rebuild procedure
- Update CLAUDE.md: never run minikube delete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:07:54 -07:00
61fcd5d70a Upgrade grafana-sidecar 1.28.0 → 2.6.0 + container.py port (#332)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dagger (grafana-sidecar) (push) Successful in 1m50s
## Summary

- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note

## Test plan

- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`

Reviewed-on: #332
2026-04-13 07:57:13 -07:00
6e60287e99 Doc review: delete install-dagger-on-nix-runner, add service-versions ref card
Outdated leaf card removed; zot.md now links to new service-versions
reference card instead. Added reverse link from review-services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:52:38 -07:00
8d80a4a3a5 Rewrite runner-logs: API-based log fetching, multi-repo support
Replace broken SSH+filesystem log retrieval with Forgejo web API
endpoint. Fix CLI to use run numbers (not task IDs), add --repo
for querying any forge repo (e.g. sporks), --limit/-n for listing
size. Document runner-logs as the way to verify build success in
CLAUDE.md and container build docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:42:58 -07:00
138e23d525 Miniflux 2.2.19 + container.py migration + ty typechecker (#331)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 1m3s
## Summary

- Upgrade miniflux from 2.2.17 to 2.2.19 (security hardening, performance improvements)
- Migrate miniflux from Dockerfile to native Dagger container.py build
- Refactor `alpine_runtime()` helper to support existing users (nobody/65534)
- Add `ty` (Astral) Python typechecker to prek hooks

## Test plan

- [ ] `dagger call build --src=. --container-name=miniflux` succeeds
- [ ] `dagger call container-version --container-name=miniflux` returns 2.2.19
- [ ] `mise run container-version-check` passes
- [ ] `ty check` passes cleanly
- [ ] `prek run --all-files` passes
- [ ] CI builds container successfully
- [ ] Miniflux healthcheck passes after deploy from branch

Reviewed-on: #331
2026-04-12 08:54:32 -07:00
c86b5d7772 Native Dagger container builds + Navidrome v0.61.1 (#330)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (navidrome) (push) Successful in 22m26s
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers

## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.

## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```

## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me

## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.

Reviewed-on: #330
2026-04-11 17:11:56 -07:00
a059d81314 Add review-compliance-reports task and reorganize report storage
New mise task fetches Prowler reports from sifaka, parses with proper
muted/unmuted distinction, shows week-over-week delta, and includes
a scaffold for Kingfisher once JSON/CSV output is available upstream.

Moved all legacy top-level reports on sifaka into date subdirectories
to match the current CronJob output structure. Updated
read-compliance-reports doc with task reference and links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:16:46 -07:00
64200a55c5 Migrate Immich from Helm chart to kustomize manifests (v2.5.6 → v2.6.3)
Replace the Helm chart deployment with plain kustomize manifests following
the Authentik pattern (separate deployments per component). Consolidate
the immich-storage ArgoCD app into the main immich app. Add no-helm-policy
doc establishing kustomize as the standard deployment mechanism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:42:25 -07:00
baee7ae54b Review single-user-cluster control and add evidence collection card
Stamp single-user-cluster last-reviewed to 2026-04-01 after verifying
Tailscale ACLs and kubeconfig distribution. Add aspirational how-to card
documenting what PCI DSS evidence collection would look like (CCW,
artifacts, Drata workflow). Link from existing review process card.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:01:57 -07:00
a18a424866 Pin NixOS service versions via nixpkgs-services overlay (#321)
## Summary

- Add `nixpkgs-services` flake input pinned to a specific nixpkgs commit, with an overlay that pulls `forgejo-runner`, `snowflake`, and `k3s` from it instead of the rolling `nixpkgs`
- Dagger `flake-update` pipeline now excludes `nixpkgs-services` via `--exclude`
- Fix stale nix-container-builder version in service-versions.yaml (was 12.6.4, actually running 12.7.2)
- Add k3s and minikube to service-versions.yaml tracking
- Document the pinning approach in review-services how-to and ringtail reference

## Motivation

During service review, discovered that flake updates had silently upgraded forgejo-runner from 12.6.4 → 12.7.2 without updating service-versions.yaml. This "sneak-in upgrade" bypasses the service review process. The overlay ensures these three services only change versions deliberately.

## Test plan

- [ ] Verify `nix flake update` from `nixos/ringtail/` does not change `nixpkgs-services` lock entry
- [ ] Verify `mise run provision-ringtail` builds successfully with the overlay
- [ ] Confirm running service versions unchanged after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #321
2026-04-01 21:37:57 -07:00
4059b3d27b Add compensating controls framework and date-based report dirs (#320)
## Summary

- Add `compensating-controls.yaml` tracking 9 named controls that justify suppressed security findings
- Update all Prowler mutelist descriptions with `CC: <id>` references to named controls
- Add `mise run review-compensating-controls` task — surfaces stalest control with all codebase references
- Add [[review-compensating-controls]] how-to doc
- Organize Prowler and Kingfisher reports into `YYYY-MM-DD` subdirectories

### Compensating controls

| ID | Mitigates |
|----|-----------|
| `single-user-cluster` | Image cache abuse, RBAC breadth, system pod privileges |
| `tailscale-network-isolation` | Profiling endpoints, weak TLS, debug ports |
| `local-registry` | AlwaysPullImages gap |
| `sso-gated-admin-tools` | ArgoCD wildcard RBAC |
| `operator-managed-pods` | Tailscale proxy pod security settings |
| `ephemeral-privileged-jobs` | Prowler hostPID exposure |
| `trusted-ci-only` | Forgejo runner DinD |
| `init-container-isolation` | Grafana root init container |
| `observability-stack-audit` | Missing apiserver audit logging |

## Test plan

- [ ] `mise run review-compensating-controls` shows table and references
- [ ] `kubectl kustomize argocd/manifests/prowler/` renders correctly
- [ ] Sync prowler and kingfisher, verify next scan writes to dated subdirectory
- [ ] Grep for `CC:` in mutelist files — every muted finding should have at least one

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #320
2026-03-30 17:44:11 -07:00
f9206bf10b Build custom Kingfisher container from sporked deploy branch (#318)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-nix (kingfisher) (push) Successful in 12s
## Summary

- Add Dockerfile for Kingfisher built from source (sporked deploy branch)
- Multi-stage: Rust build with Boost/vectorscan, debian-slim runtime
- Switch CronJob from upstream `ghcr.io/mongodb/kingfisher` to `registry.ops.eblu.me/blumeops/kingfisher`
- Add kingfisher to service-versions.yaml (version tracks upstream main SHA)
- Document spork workflow in CLAUDE.md

## Test plan

- [ ] Build container: `mise run container-build-and-release kingfisher 1d37d29`
- [ ] Verify image on registry: `mise run container-list`
- [ ] Update kustomization newTag
- [ ] Sync ArgoCD kingfisher app from branch
- [ ] Trigger manual CronJob and verify scan completes
- [ ] Verify reports on sifaka

Reviewed-on: #318
2026-03-30 06:34:49 -07:00
99df78664e Note upstream history rewrite as a spork sync failure mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:22:11 -07:00
6ecfaf02b6 Add spork strategy: tooling and documentation
Spork-create mise task sets up a floating-branch soft-fork of a
mirrored upstream project with daily mirror-sync via Forgejo Actions.
Includes explanation card, how-to guides for setup and branch
management, and the spork-create uv script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:58:10 -07:00
2bd1611ac1 Document sifaka NFS/Tailscale TUN troubleshooting
Sifaka's Tailscale can revert to userspace networking after package
updates, causing NFS mounts to fail because the NFS daemon sees
127.0.0.1 instead of the client's Tailscale IP. Added troubleshooting
how-to doc and updated sifaka reference card with frigate export and
TUN requirement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:12:00 -07:00
3017f759a7 Migrate Forgejo from Homebrew to source build (#316)
## Summary

- Migrate Forgejo from Homebrew to source-built binary with mcquack LaunchAgent
- Matches the established pattern used by zot, caddy, and alloy
- Upgrades to v14.0.3 (7 security fixes: PKCE bypass, OAuth scope bypass, open redirect, and more)

## Changes

- **Ansible role**: Replace brew install/services with binary stat check + LaunchAgent
- **Paths**: `/opt/homebrew/var/forgejo` → `~/forgejo`, binary at `~/code/3rd/forgejo/forgejo`
- **Run user**: `forgejo` → `erichblume` (LaunchAgent user; SSH git user stays `forgejo`)
- **Docs**: Updated Forgejo reference card, restart-indri guide
- **Service review**: Stamped frigate-notify, cloudnative-pg, blumeops-pg as current

## One-time migration steps (manual, on indri)

1. Clone from Codeberg, add forge mirror remote
2. Check out v14.0.3, build with `make build && make forgejo`
3. Stop brew, `cp -a` data to `~/forgejo`, fix ownership
4. Run `provision-indri --tags forgejo`
5. Verify, then `brew uninstall forgejo`

## Data safety

- `cp -a` preserves everything (repos, SQLite DB, LFS, sessions, OAuth config)
- Brew version stays installed as rollback until verification passes
- No schema changes between 14.0.2 → 14.0.3

Reviewed-on: #316
2026-03-28 08:19:23 -07:00
66a47738dd Add ringtail post-deploy maintenance: kernel check, generation pruning, GC
Update manage-lockfile doc with post-deploy steps (kernel update detection,
reboot guidance, generation pruning). Add prune-ringtail-generations mise
task that keeps the 5 most recent generations plus the most recent one
matching the booted kernel for safe rollback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:55:45 -07:00
687e972713 Review CV doc and close build-dep review gap
Fix stale CV service doc (URL, forge domain, container tag) and add
guidance for reviewing build-time dependencies in private forge repos
during service reviews.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:12:38 -07:00
fe201a495c Add Prowler IaC scanning of blumeops repo (Saturday 2am)
Clone repo in init container, scan Dockerfiles and K8s manifests
with Prowler's IaC provider (Trivy). Reports written to
sifaka:/volume1/reports/prowler-iac/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:49:38 -07:00
696024306c Add Prowler image vulnerability scanning for blumeops containers
All checks were successful
Build Container / detect (push) Successful in 39s
Build Container / build-dockerfile (prowler) (push) Successful in 10m15s
Add Trivy to the Prowler container for image and IaC scanning.
New CronJob (Saturday 3am) scans all blumeops/* images in the
registry for CVEs, embedded secrets, and Dockerfile misconfigs.
Reports written to sifaka:/volume1/reports/prowler-images/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:43:08 -07:00
d021b3534f Deploy Prowler CIS scanner (#310)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dockerfile (prowler) (push) Successful in 10s
## Summary
- Deploy Prowler 5 as a weekly CronJob on minikube-indri for CIS Kubernetes Benchmark v1.11 scanning
- Custom slim container build (strips PowerShell, Trivy, and non-K8s providers from upstream)
- Reports (HTML, CSV, JSON-OCSF) written to NFS share on sifaka at `/volume1/reports/prowler/`
- Read-only ClusterRole for pod, RBAC, and control plane inspection
- Host path mounts + hostPID for kubelet file permission checks

## Follow-ups
- Mirror prowler-cloud/prowler on forge for supply chain control
- Build and push container image, update kustomization.yaml newTag
- Consider adding k3s-ringtail scanning (core + RBAC checks only)

## Test plan
- [ ] Build container: `mise run container-release prowler v5.22.0`
- [ ] Update `argocd/manifests/prowler/kustomization.yaml` newTag to built image tag
- [ ] Sync ArgoCD: `argocd app sync apps && argocd app set prowler --revision deploy-prowler && argocd app sync prowler`
- [ ] Trigger manual job: `kubectl create job --from=cronjob/prowler prowler-manual -n prowler --context=minikube-indri`
- [ ] Verify reports appear on sifaka NFS share
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #310
2026-03-24 16:08:09 -07:00
fd0bebb0fc Localize authentik-redis container (#309)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dockerfile (alloy) (push) Successful in 12s
Build Container / build-dockerfile (ntfy) (push) Successful in 11s
Build Container / build-nix (alloy) (push) Successful in 20s
Build Container / build-nix (authentik) (push) Successful in 6m10s
Build Container / build-nix (authentik-redis) (push) Successful in 20s
Build Container / build-nix (ntfy) (push) Successful in 6s
## Summary

- Replace upstream `docker.io/library/redis:7-alpine` (Redis 7.4.8) with a nix-built container using Redis 8.2.3 from nixpkgs
- Introduce **attached service pattern**: `parent` field in service-versions.yaml, `<parent>-<component>` naming convention, and `assert pkgs.redis.version == version` in default.nix to prevent silent version drift on `flake.lock` updates
- Document the pattern in [[review-services]] so future attached services slot in cleanly
- Backfill `parent: grafana` on existing `grafana-sidecar` entry

## Version drift protection

1. `flake.lock` update bumps nixpkgs redis → `assert` in `default.nix` breaks `nix-build`
2. Developer updates `version` in `default.nix` → prek's `container-version-check` demands matching `service-versions.yaml` update
3. Both must agree before commit succeeds

## Test plan

- [ ] Build container from branch on ringtail (`mise run container-build-and-release authentik-redis`)
- [ ] Update kustomization `newTag` to branch-built image tag
- [ ] Sync authentik ArgoCD app from branch (`argocd app set authentik --revision localize-redis && argocd app sync authentik`)
- [ ] Verify Authentik login, session persistence, and task queue still work
- [ ] After merge: C0 follow-up to update `newTag` to the main-built image tag

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #309
2026-03-24 13:27:36 -07:00
fc45989a6c Decommission JobSync service (#308)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary

- Remove all JobSync infrastructure: ArgoCD app, k8s manifests, container build (nix), Caddy reverse proxy entry, Homepage dashboard entry, service-versions tracking, and all documentation
- Runtime teardown already completed: ArgoCD app cascade-deleted (removes deployment, PVC, service, ingress, external-secret), forge mirror deleted, 1Password item archived, local clone removed

## Motivation

Replacing JobSync with a datasette-based job tracking pipeline driven by mise tasks and a Claude agent frontend. JobSync's Next.js server actions don't expose a useful API for automation.

## Remaining manual steps after merge

- Provision Caddy to remove the stale proxy route: `mise run provision-indri -- --tags caddy`
- Sync Homepage: `argocd app sync homepage`
- Verify namespace cleanup on ringtail: `kubectl get ns jobsync --context=k3s-ringtail` (should be gone)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #308
2026-03-24 08:44:23 -07:00
bd0ff30d3f Unify container build workflows (#306)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary
- Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow
- Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix`
- Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo
- Containers with both build files (alloy, nettest, ntfy) get built on both runners

## Test plan
- [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only
- [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only
- [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners
- [ ] Test workflow_dispatch with a specific container name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #306
2026-03-23 20:55:50 -07:00
6d65e6928c C2: Deploy infrastructure alerting pipeline (#303)
## Summary

Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.

**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes

**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #303
2026-03-22 14:52:56 -07:00
e5ce510fdc Fix plan-a-meal random recipe API queries
Mealie's orderBy=random requires a paginationSeed parameter, otherwise
the API returns 422. Added the seed to all random query examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:10:48 -07:00
6d7597670e Add plan-a-meal how-to for Mealie cooking timelines
Agent-facing guide for generating unified cooking timelines from
Mealie meal plans. Covers querying the API, picking balanced meals
(protein/carb/vegetable), and interleaving recipe steps into a
relative timeline so everything finishes together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:07:16 -07:00
0f5377568d Review operations docs: add last-reviewed dates and improve troubleshooting
Mark run-1password-backup and troubleshooting as reviewed. Troubleshooting
gets inline wiki-links for all referenced services, a new ringtail/k3s
section, and a cross-reference to restart-indri.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 07:38:02 -07:00
f46a04b902 Restructure docs: consolidate, recategorize, and extract
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
- Consolidate 4 Authentik Nix derivation docs into one card
  (authentik-nix-build-components.md)
- Merge build-grafana-container + build-grafana-sidecar into
  build-grafana-images.md
- Move agent-change-process from how-to/ to explanation/ (it's a
  methodology doc, not a task guide)
- Extract Caddy custom build section from reference card into
  how-to/deployment/build-caddy-with-plugins.md
- Move expose-service-publicly from how-to/ to tutorials/ (it's a
  comprehensive walkthrough, not a quick task reference)
- Update all wiki-link references across affected docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 19:55:59 -07:00
5b008a6ab6 Document ai-sources in AI guide, change process, and mise-tasks ref
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:43:39 -07:00
4d195f7fb4 Review restore-1password-backup doc: fix offsite TBD, clarify archive name, add BorgBase to backups
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:13:07 -07:00
8b3b17d555 Review restart-indri doc: fix Caddy/Jellyfin service management, fix docs-preview path handling
- Caddy is now a mcquack LaunchAgent, not brew services
- Add missing Jellyfin and Caddy to shutdown commands and autostart list
- docs-preview: accept paths with or without docs/ prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 10:09:38 -07:00
4c5e7d763d Review deploy-jobsync doc: add missing env var, update tag example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 15:45:07 -07:00
8b9cc4effd Add how-to card for running 1Password backup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:17:45 -07:00
d5a92fead8 Review build-jobsync-container, refine docs-preview tooling
- Review build-jobsync-container.md: fix nonexistent `mirror-sync` task
  reference (Forgejo mirrors sync automatically), mark reviewed
- Remove bat hint from docs-review checklist (output not visible in
  agent sessions), keep docs-preview hint as user-facing step
- Simplify review-documentation.md visual preview section
- Fix Python 3.14 tarfile deprecation warning in docs-preview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:11:34 -07:00
d01a165b91 Add docs-preview task and visual preview step to doc review
New `mise run docs-preview <card>` task builds docs via Dagger and serves
them locally in the production quartz container (image parsed from ArgoCD
kustomization), opening the browser directly to the specified card.
Container auto-cleans after 1 hour.

Also updates docs-review checklist and review-documentation how-to to
reference the visual preview workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:04:01 -07:00
4f0476a851 Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 10s
## Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.

**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)

**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs

## Deployment and Testing

- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs

Reviewed-on: #290
2026-03-09 11:59:43 -07:00
770a7b2d6a Add JobSync reference card, observability docs, and RAPIDAPI_KEY plumbing (#289)
## Summary
- Add JobSync service reference card (`docs/reference/services/jobsync.md`) with architecture, secrets, observability, and JSearch API docs
- Add JobSync and Ollama to ringtail's workloads table (both were missing)
- Add JobSync to the reference index
- Wire `RAPIDAPI_KEY` through ExternalSecret and deployment env var for JSearch job search automation
- Document Loki log queries for observability (no metrics endpoint exists)
- Update deploy-jobsync how-to with new env var, observability section, and reference card link

## Deployment and Testing
- [ ] Sign up for RapidAPI JSearch API (free tier: 500 req/month)
- [ ] Add `rapidapi_key` field to "JobSync" 1Password item
- [ ] Merge PR
- [ ] `argocd app sync jobsync` to pick up new env var
- [ ] Verify job search works at https://jobsync.ops.eblu.me/dashboard/automations

Reviewed-on: #289
2026-03-08 15:06:52 -07:00
3a811fb188 Deploy JobSync — job search tracker on ringtail k3s (#288)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (jobsync) (push) Successful in 2s
Build Container (Nix) / build (jobsync) (push) Successful in 8s
## Summary

C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.

### Mikado Graph

```
deploy-jobsync (goal)
├── build-jobsync-container
│   └── mirror-jobsync
└── integrate-jobsync-ollama
```

### What is JobSync?

Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.

### Key Decisions

- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed

### Current Status

Planning phase — cards committed, ready for review before implementation begins.

Reviewed-on: #288
2026-03-08 11:02:05 -07:00
0e09521ce3 Review manage-flyio-proxy.md — no issues found
Add last-reviewed date. Content is accurate and complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:46 -08:00
6a033d55be Review and update review-services.md
- Add last-reviewed date
- Align service type sections with actual types (argocd/ansible/nixos)
- Remove nonexistent "Helm Chart" and "Hybrid" sections
- Fold custom container guidance into ArgoCD section
- Reference kustomization.yaml for image tags instead of Helm charts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:08 -08:00
e47a3b2ebb Review and update review-documentation.md
- Add last-reviewed date
- Replace raw pulumi commands with mise task equivalents
- Reference C0/C1/C2 change classification for making changes
- Note that prek handles link validation automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:59:51 -08:00
ba7236ade0 Add how-to guide for upgrading Dagger
Documents the correct two-phase upgrade procedure to avoid the
chicken-and-egg problem where CI can't build its own replacement.
Also fixes outdated version references in the Dagger reference card.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:31:30 -08:00