blumeops

Author	SHA1	Message	Date
Erich Blume	95364dcb48	Simplify runner image (Dagger Phase 3) (#162 ) All checks were successful Build Container / build (push) Successful in 1m13s Details ## Summary With Phases 1 and 2 complete, the runner image no longer needs most of its bundled tools. This PR strips it down and adds what was missing. Removed (now inside Dagger containers): - Node.js 24.x - Docker CLI + buildx plugin - skopeo - gnupg, lsb-release, xz-utils Added: - `tzdata` — fixes the TZ env var (#159, #160, #161) so `TZ=America/Los_Angeles` actually works - `flyctl` — was being installed from scratch every release Workflow changes: - Remove "Ensure Dagger CLI" bootstrap steps from both workflows (Dagger is in the image) - Remove "Install flyctl" step from build-blumeops (flyctl is in the image) - Remove job-level `TZ` from build-blumeops (moved to runner configmap `runner.envs`) - Set `TZ: America/Los_Angeles` in runner configmap so all job containers inherit it ## Deployment After merge: 1. Build and release the new runner image: `mise run container-release forgejo-runner v2.0.0` 2. Sync the runner: `argocd app sync forgejo-runner` 3. Verify: `kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date` (but the real test is running a docs release and checking the changelog date) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/162	2026-02-11 17:24:20 -08:00
Erich Blume	e84ffb7d7f	Set TZ on build-blumeops workflow job (#161 ) ## Summary The runner pod's `TZ` env var (#159, #160) doesn't propagate to workflow job containers — jobs run inside Docker containers spawned by the DinD sidecar, not in the runner process itself. Set `TZ: America/Los_Angeles` at the job level so `uvx towncrier build` uses the correct timezone. This is the actual fix for the Feb 12 changelog dates. The runner pod TZ is still useful for runner daemon logs but doesn't affect job execution. Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/161	2026-02-11 17:06:44 -08:00
Erich Blume	b197bd5f58	Adopt Dagger CI for docs build (Phase 2) (#157 ) ## Summary Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan). - Backfill `date-modified` frontmatter on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git. - New `docs-check-frontmatter` mise task + pre-commit hook — validates all docs have `title`, `tags`, and `date-modified` - New Dagger functions — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball) - Simplified CI workflow — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit. ### Design decisions - Towncrier runs twice in CI: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host. - Artifact hosting stays on Forgejo Releases (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently. - `date-modified` frontmatter preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates. ### Local iteration ```bash dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz tar tf docs-dev.tar.gz \| head -20 ``` ## Deployment and Testing - [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages) - [x] Pre-commit hooks pass (including new `docs-check-frontmatter`) - [ ] Full `workflow_dispatch` run after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157	2026-02-11 16:33:16 -08:00
Erich Blume	1bc2b421a8	Adopt Dagger CI for container builds (Phase 1) (#156 ) All checks were successful Build Container / build (push) Successful in 13s Details ## Summary - Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images - Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml` - BuildKit's native push is compatible with Zot — skopeo workaround eliminated - Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0 - Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner) - Delete old `.forgejo/actions/build-push-image/` composite action - Add GPLv3 LICENSE ## Verified locally - `dagger call build --src=. --container-name=nettest` — builds ✓ - `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓ - `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓ - Dagger CLI accessible inside built runner image ✓ ## Deployment sequence (after merge) 1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner 2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in) 3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline 4. `mise run container-list` — verify tags ## Not included (future phases) - Phase 2: docs build + Forgejo packages migration - Phase 3: runner simplification (remove skopeo, Node.js, etc.) - Phase 4: future workflows Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156	2026-02-11 15:38:31 -08:00
Erich Blume	cef7611cba	Wrap fly ssh cache purge in sh -c for BusyBox fly ssh console -C doesn't run through a shell, so && was passed as literal arguments to rm. Wrap in sh -c to get proper shell parsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 13:35:11 -08:00
Erich Blume	0efcce2984	Purge Fly.io proxy cache after docs release (#154 ) ## Summary - The Fly.io nginx proxy caches docs responses for 24h (`proxy_cache_valid 200 1d`) - After a release, docs.eblu.me kept serving stale content until the cache expired - This caused v1.5.4 to show v1.5.3 on the CHANGELOG page - Adds `flyctl` install and `fly ssh console` cache purge steps to the build workflow, running after the ArgoCD deploy completes ## Test plan - [ ] Next release should show the correct version on docs.eblu.me/CHANGELOG immediately - [ ] Verify the `fly ssh console` command succeeds in the workflow logs Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/154	2026-02-11 13:33:26 -08:00
Erich Blume	64a78422b1	Add Fly.io public reverse proxy for docs.eblu.me (#120 ) Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 9s Details ## Summary - Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale - First service exposed: `docs.eblu.me` — the Quartz static docs site - Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME - Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow ## Key details - Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed - Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts - nginx caches aggressively for the static site; health check is on the default_server block - ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only - DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev` ## Test plan - [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok` - [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status` - [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert - [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected) - [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120	2026-02-08 02:36:19 -08:00
Erich Blume	95610d8e54	Fix Quartz build to preserve git history for accurate file dates (#106 ) ## Summary Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings by using Quartz's `-d docs` flag instead of symlinking. ## Problem The previous approach symlinked `content -> docs`, but git doesn't follow symlinks. When Quartz asked git about `content/index.md`, git had no history for that path. ## Solution Use `npx quartz build -d docs` to tell Quartz to read from `docs/` directly. Now when Quartz asks git about `docs/index.md`, git finds the actual file history. - CHANGELOG.md is copied (not symlinked) into `docs/` for the build, then removed - All other files have accurate git-based dates ## Testing Tested locally - build produces no warnings. Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/106	2026-02-04 08:46:47 -08:00
Erich Blume	03bda41de4	Fix Quartz build to preserve git history for accurate file dates (#105 ) ## Summary Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings in the Build docs step by restructuring how Quartz builds the documentation. ## Problem Previously, we copied docs into Quartz's content folder. Since this was inside a fresh Quartz clone with no history of our files, the `CreatedModifiedDate` plugin couldn't determine accurate dates. ## Solution Build Quartz from within the blumeops repo instead: 1. Copy Quartz's build system (quartz/, package.json, etc.) into the workspace 2. Symlink `content` -> `docs` (preserves git history) 3. Symlink `docs/CHANGELOG.md` -> `../CHANGELOG.md` 4. Build from workspace root where git can trace file history 5. Clean up artifacts after creating tarball ## Deployment and Testing - [ ] Run build workflow and verify no "not tracked by git" warnings - [ ] Verify file dates appear correctly on built docs site Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/105	2026-02-04 08:25:46 -08:00
Erich Blume	efdd569285	Improve build workflow with version bump selection and changelog in releases (#104 ) ## Summary - Add `version_type` choice input with options: BUMP_PATCH (default), BUMP_MINOR, BUMP_MAJOR, SPECIFIC_VERSION - Add optional `specific_version` input for explicit version selection - Include changelog content in Forgejo release body under "What's Changed" section - Move CHANGELOG.md to repository root (still copied into docs during Quartz build) - Add CHANGELOG link to docs index page - Update doc-links script to recognize build-time docs from repo root ## Changes Workflow inputs: - Previously: single optional `version` string input - Now: `version_type` choice dropdown (defaults to BUMP_PATCH) + optional `specific_version` for explicit versions Release body: - Previously: just asset download instructions - Now: includes "What's Changed" section with changelog entries for this release CHANGELOG.md location: - Previously: `docs/CHANGELOG.md` - Now: `CHANGELOG.md` (repo root), copied into docs content during build ## Deployment and Testing - [ ] Run build workflow with BUMP_PATCH (default) - [ ] Run build workflow with BUMP_MINOR - [ ] Verify changelog appears in release body - [ ] Verify docs site includes CHANGELOG page Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/104	2026-02-04 08:13:16 -08:00
Erich Blume	82bcd935cd	Move DOCS_RELEASE_URL from ConfigMap to Deployment This ensures ArgoCD sync triggers a pod rollout when the URL changes, since ConfigMap data changes don't restart pods automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:23:52 -08:00
Erich Blume	1f73eb675d	Auto-deploy docs from build workflow (#93 ) ## Summary - Add `uv` and `argocd` CLI to forgejo-runner container image - Add `workflow-bot` ArgoCD account with sync permissions (declarative via kustomize patches) - Add `ARGOCD_AUTH_TOKEN` to forgejo-runner external secret for workflow auth - Update build workflow to auto-deploy docs after release: - Update configmap with new release URL - Commit changelog and configmap changes - Sync docs app via ArgoCD ## Deployment and Testing Manual steps required before this can work: 1. [ ] Build and push new forgejo-runner image (v2.4.0) 2. [ ] Sync argocd app to create workflow-bot account 3. [ ] Generate token: `argocd account generate-token --account workflow-bot` 4. [ ] Store token in 1Password under "Forgejo Secrets" with field `argocd_token` 5. [ ] Sync forgejo-runner app to pick up new external secret 6. [ ] Update forgejo-runner deployment to use new image version 7. [ ] Test by running workflow manually 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/93	2026-02-03 16:58:03 -08:00
Erich Blume	9a8587b83f	Add towncrier changelog system (#86 ) ## Summary - Configure towncrier with custom types (feature, bugfix, infra, doc, misc) - Build initial v0.1.0 changelog from zk management log entries - Integrate towncrier into build-blumeops workflow - Update README to mark Phase 1b complete ## How It Works 1. Add changelog fragments to `docs/changelog.d/` as `<id>.<type>.md` 2. When running build-blumeops workflow, towncrier collects fragments 3. CHANGELOG.md is updated and fragments are removed 4. Changes are committed back to main before docs build ## Testing - [x] Tested `uvx towncrier build` locally - [ ] Test workflow execution (after merge) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/86	2026-02-03 11:48:13 -08:00
Erich Blume	95a82321ee	Add authentication and error logging to release creation - Add Authorization header using GITHUB_TOKEN - Remove silent fail flag to see error responses - Log API responses for debugging Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:33:47 -08:00
Erich Blume	8780928b9a	Use GitHub upstream for Quartz until mirror is fixed Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:17:40 -08:00
Erich Blume	f11f8a4e89	Fix workflow to handle no existing releases Remove -f flag from curl so 404 on /releases/latest doesn't fail the script when there are no releases yet. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:15:53 -08:00
Erich Blume	b8104d75ad	Move zk cards to docs/zk/ for documentation restructuring (#84 ) ## Summary - Move all existing zettelkasten cards from `docs/` to `docs/zk/` as a temporary holding area - Update `zk-docs` mise task to look in the new location - Add `docs/README.md` explaining the Diataxis-based restructuring plan and target audiences ## Context This is phase 1 of a multi-phase documentation restructuring effort. The goal is to reorganize docs to follow the Diataxis framework while serving multiple audiences: 1. Erich (owner) - knowledge graph/zk 2. Claude/AI agents - memory and context enrichment 3. New external readers - high-level overview 4. Potential operators/contributors - onboarding 5. Replicators - people wanting to duplicate the approach ## Testing - [x] Verified `mise run zk-docs` still works with the new path - [x] Updated obsidian.nvim config (in ~/.config/nvim) to point to new path ## Note The obsidian.nvim config change is outside this repo but was made as part of this work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/84	2026-02-03 09:13:50 -08:00
Erich Blume	fd29244854	Simplify CI: remove Tailscale sidecar, use skopeo for push (#74 ) ## Summary - Remove Tailscale sidecar from build-push-image action - registry.ops.eblu.me is directly reachable from k8s pods via Caddy - Use skopeo for pushing images instead of docker push - Docker 27's manifest format has compatibility issues with zot registry - Remove tailscale_authkey secret requirement from workflows ## Deployment and Testing - [x] Tested with nettest-v0.10.0 tag - build succeeded and image pushed to registry 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/74	2026-01-30 10:18:20 -08:00
Erich Blume	ea42362b6f	Migrate Forgejo runner to Kubernetes with DinD (#60 ) ## Summary - Deploy Forgejo runner to k8s with Docker-in-Docker sidecar - Add job execution image with Node.js and Docker CLI - Retire host-mode runner on indri - All CI jobs now run containerized in k8s ## Components Added - `containers/forgejo-runner/Dockerfile` - Job execution image - `argocd/apps/forgejo-runner.yaml` - ArgoCD Application - `argocd/manifests/forgejo-runner/` - Kubernetes manifests ## Components Removed - `ansible/roles/forgejo_runner/` - No longer needed ## Changes to Existing Files - `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env - `.github/actionlint.yaml` - Only `k8s` label now valid ## Deployment 1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl \| kubectl --context=minikube-indri apply -f -` 2. Sync ArgoCD: `argocd app sync forgejo-runner` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60	2026-01-25 19:56:17 -08:00
Erich Blume	424647cd93	Use Tailscale sidecar for container registry push Some checks failed Build Container / build (push) Failing after 1m9s Details Docker Desktop's VM can't resolve tailnet hostnames. Work around this by: 1. Starting a Tailscale container that joins the tailnet 2. Building the image with docker build 3. Saving to tarball with docker save 4. Pushing via skopeo inside the Tailscale container Uses TS_CI_GATEWAY_AUTHKEY repository secret for authentication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 19:29:01 -08:00
Erich Blume	31697b4d63	Add nettest container for CI/CD network debugging (#52 ) Some checks failed Build Container / build (push) Failing after 18s Details ## Summary - Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script - Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags - Test script checks DNS resolution and HTTPS connectivity to forge and registry ## Deployment and Testing - [ ] Merge PR to main - [ ] Run `mise run container-release nettest v0.1.0` to trigger first build - [ ] Verify workflow runs successfully and container can reach tailnet services - [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52	2026-01-24 16:54:35 -08:00
Erich Blume	8ca8798121	Switch to Buildah for container builds (#51 ) All checks were successful Test CI / test (push) Successful in 4s Details ## Summary - Replace Docker with Buildah for container image builds - No Docker socket required - buildah is daemonless - Cleaner security model (no privileged containers or socket mounting) - Remove Docker-related security context from deployment ## Changes - Update Dockerfile to install buildah/podman instead of docker-cli - Configure buildah storage with overlay driver and fuse-overlayfs - Update composite action to use `buildah bud` and `buildah push` - Add `imagePullPolicy: Always` to ensure fresh image pulls - Update test workflow to verify buildah/podman ## Testing - [ ] Runner pod starts successfully - [ ] Buildah is available in runner - [ ] Test workflow verifies buildah/podman versions - [ ] Container build workflow builds and pushes to zot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51	2026-01-24 13:30:26 -08:00
Erich Blume	5fcd122494	Reorganize CI/CD bootstrap phases and add custom runner Dockerfile (#50 ) All checks were successful Test CI / test (push) Successful in 2s Details ## Summary - Reorder CI/CD bootstrap phases to address chicken-and-egg problem - P2 is now "Custom Runner Image" (stock runner lacks Node.js) - Add P3 for "Mirror Forgejo & Build from Source" - Rename P3 -> P4 (Self-Deploy), P4 -> P5 (Container Builds) - Add Dockerfile for custom runner with Node.js, npm, docker, build tools - Update overview with new phase structure, host mode notes, and cross-compilation challenge ## Key Changes ### Phase Reordering \| Old \| New \| Name \| \|-----\|-----\|------\| \| P1 \| P1 \| Enable Actions (complete) \| \| P2 \| P2 \| Custom Runner Image (new focus) \| \| - \| P3 \| Mirror Forgejo & Build (new) \| \| P3 \| P4 \| Self-Deploy \| \| P4 \| P5 \| Container Builds \| ### Custom Runner Dockerfile The stock `forgejo/runner:3.5.1` image lacks Node.js, so `actions/checkout@v4` doesn't work. The new Dockerfile adds: - Node.js + npm (for GitHub Actions) - Docker CLI (for container builds) - Build tools (gcc, make, curl, jq) ### Bootstrap Strategy 1. Build custom runner image manually on gilbert (podman build) 2. Push to zot registry 3. Update deployment to use custom image 4. Then enable auto-build workflow for runner ## Deployment and Testing - [x] Review plan changes - [x] Build custom runner image manually and verify - [x] Update runner deployment - [x] Test `actions/checkout@v4` works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/50	2026-01-23 18:50:27 -08:00
Erich Blume	3bcad4189f	Add actionlint pre-commit hook for workflow validation (#49 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Fix workflow to use `github.` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.` aliases) - Pass untrusted inputs through environment variables (security best practice per actionlint) - Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally ## Deployment and Testing - [x] Pre-commit hooks all pass - [x] actionlint validates `.forgejo/workflows/test.yaml` successfully - [ ] Verify workflow runs without errors on Forge after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49	2026-01-23 17:56:24 -08:00
Erich Blume	7893c41020	Enable Forgejo Actions (Phase 1) (#48 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password - Enable Forgejo Actions in config (`[actions] ENABLED = true`) - Add `repo.actions` to DEFAULT_REPO_UNITS - Clean up unused MySQL database fields (we use SQLite) ## Phase 1 Progress This PR covers the first part of Phase 1 (ci-cd-bootstrap plan): - [x] Refactor app.ini to ansible template - [x] Store secrets in 1Password - [x] Enable Actions in config - [ ] Deploy config changes (pending review) - [ ] Create runner registration token - [ ] Deploy runner to k8s - [ ] Test with simple workflow ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags forgejo` to deploy - [ ] Verify Forgejo restarts correctly - [ ] Verify Actions tab appears in repo settings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48	2026-01-23 17:00:12 -08:00

25 commits