Commit graph

23 commits

Author SHA1 Message Date
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
1bc2b421a8 Adopt Dagger CI for container builds (Phase 1) (#156)
All checks were successful
Build Container / build (push) Successful in 13s
## Summary

- Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images
- Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml`
- BuildKit's native push is compatible with Zot — **skopeo workaround eliminated**
- Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0
- Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner)
- Delete old `.forgejo/actions/build-push-image/` composite action
- Add GPLv3 LICENSE

## Verified locally

- `dagger call build --src=. --container-name=nettest` — builds ✓
- `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓
- `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓
- Dagger CLI accessible inside built runner image ✓

## Deployment sequence (after merge)

1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner
2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in)
3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline
4. `mise run container-list` — verify tags

## Not included (future phases)

- Phase 2: docs build + Forgejo packages migration
- Phase 3: runner simplification (remove skopeo, Node.js, etc.)
- Phase 4: future workflows

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156
2026-02-11 15:38:31 -08:00
cef7611cba Wrap fly ssh cache purge in sh -c for BusyBox
fly ssh console -C doesn't run through a shell, so && was passed as
literal arguments to rm. Wrap in sh -c to get proper shell parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 13:35:11 -08:00
0efcce2984 Purge Fly.io proxy cache after docs release (#154)
## Summary
- The Fly.io nginx proxy caches docs responses for 24h (`proxy_cache_valid 200 1d`)
- After a release, docs.eblu.me kept serving stale content until the cache expired
- This caused v1.5.4 to show v1.5.3 on the CHANGELOG page
- Adds `flyctl` install and `fly ssh console` cache purge steps to the build workflow, running after the ArgoCD deploy completes

## Test plan
- [ ] Next release should show the correct version on docs.eblu.me/CHANGELOG immediately
- [ ] Verify the `fly ssh console` command succeeds in the workflow logs

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/154
2026-02-11 13:33:26 -08:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
95610d8e54 Fix Quartz build to preserve git history for accurate file dates (#106)
## Summary

Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings by using Quartz's `-d docs` flag instead of symlinking.

## Problem

The previous approach symlinked `content -> docs`, but git doesn't follow symlinks. When Quartz asked git about `content/index.md`, git had no history for that path.

## Solution

Use `npx quartz build -d docs` to tell Quartz to read from `docs/` directly. Now when Quartz asks git about `docs/index.md`, git finds the actual file history.

- CHANGELOG.md is copied (not symlinked) into `docs/` for the build, then removed
- All other files have accurate git-based dates

## Testing

Tested locally - build produces no warnings.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/106
2026-02-04 08:46:47 -08:00
03bda41de4 Fix Quartz build to preserve git history for accurate file dates (#105)
## Summary

Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings in the Build docs step by restructuring how Quartz builds the documentation.

## Problem

Previously, we copied docs into Quartz's content folder. Since this was inside a fresh Quartz clone with no history of our files, the `CreatedModifiedDate` plugin couldn't determine accurate dates.

## Solution

Build Quartz from within the blumeops repo instead:
1. Copy Quartz's build system (quartz/, package.json, etc.) into the workspace
2. Symlink `content` -> `docs` (preserves git history)
3. Symlink `docs/CHANGELOG.md` -> `../CHANGELOG.md`
4. Build from workspace root where git can trace file history
5. Clean up artifacts after creating tarball

## Deployment and Testing

- [ ] Run build workflow and verify no "not tracked by git" warnings
- [ ] Verify file dates appear correctly on built docs site

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/105
2026-02-04 08:25:46 -08:00
efdd569285 Improve build workflow with version bump selection and changelog in releases (#104)
## Summary

- Add `version_type` choice input with options: BUMP_PATCH (default), BUMP_MINOR, BUMP_MAJOR, SPECIFIC_VERSION
- Add optional `specific_version` input for explicit version selection
- Include changelog content in Forgejo release body under "What's Changed" section
- Move CHANGELOG.md to repository root (still copied into docs during Quartz build)
- Add CHANGELOG link to docs index page
- Update doc-links script to recognize build-time docs from repo root

## Changes

**Workflow inputs:**
- Previously: single optional `version` string input
- Now: `version_type` choice dropdown (defaults to BUMP_PATCH) + optional `specific_version` for explicit versions

**Release body:**
- Previously: just asset download instructions
- Now: includes "What's Changed" section with changelog entries for this release

**CHANGELOG.md location:**
- Previously: `docs/CHANGELOG.md`
- Now: `CHANGELOG.md` (repo root), copied into docs content during build

## Deployment and Testing

- [ ] Run build workflow with BUMP_PATCH (default)
- [ ] Run build workflow with BUMP_MINOR
- [ ] Verify changelog appears in release body
- [ ] Verify docs site includes CHANGELOG page

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/104
2026-02-04 08:13:16 -08:00
82bcd935cd Move DOCS_RELEASE_URL from ConfigMap to Deployment
This ensures ArgoCD sync triggers a pod rollout when the URL changes,
since ConfigMap data changes don't restart pods automatically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:23:52 -08:00
1f73eb675d Auto-deploy docs from build workflow (#93)
## Summary
- Add `uv` and `argocd` CLI to forgejo-runner container image
- Add `workflow-bot` ArgoCD account with sync permissions (declarative via kustomize patches)
- Add `ARGOCD_AUTH_TOKEN` to forgejo-runner external secret for workflow auth
- Update build workflow to auto-deploy docs after release:
  - Update configmap with new release URL
  - Commit changelog and configmap changes
  - Sync docs app via ArgoCD

## Deployment and Testing
Manual steps required before this can work:
1. [ ] Build and push new forgejo-runner image (v2.4.0)
2. [ ] Sync argocd app to create workflow-bot account
3. [ ] Generate token: `argocd account generate-token --account workflow-bot`
4. [ ] Store token in 1Password under "Forgejo Secrets" with field `argocd_token`
5. [ ] Sync forgejo-runner app to pick up new external secret
6. [ ] Update forgejo-runner deployment to use new image version
7. [ ] Test by running workflow manually

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/93
2026-02-03 16:58:03 -08:00
9a8587b83f Add towncrier changelog system (#86)
## Summary
- Configure towncrier with custom types (feature, bugfix, infra, doc, misc)
- Build initial v0.1.0 changelog from zk management log entries
- Integrate towncrier into build-blumeops workflow
- Update README to mark Phase 1b complete

## How It Works
1. Add changelog fragments to `docs/changelog.d/` as `<id>.<type>.md`
2. When running build-blumeops workflow, towncrier collects fragments
3. CHANGELOG.md is updated and fragments are removed
4. Changes are committed back to main before docs build

## Testing
- [x] Tested `uvx towncrier build` locally
- [ ] Test workflow execution (after merge)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/86
2026-02-03 11:48:13 -08:00
95a82321ee Add authentication and error logging to release creation
- Add Authorization header using GITHUB_TOKEN
- Remove silent fail flag to see error responses
- Log API responses for debugging

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 09:33:47 -08:00
8780928b9a Use GitHub upstream for Quartz until mirror is fixed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 09:17:40 -08:00
f11f8a4e89 Fix workflow to handle no existing releases
Remove -f flag from curl so 404 on /releases/latest doesn't fail the
script when there are no releases yet.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 09:15:53 -08:00
b8104d75ad Move zk cards to docs/zk/ for documentation restructuring (#84)
## Summary
- Move all existing zettelkasten cards from `docs/` to `docs/zk/` as a temporary holding area
- Update `zk-docs` mise task to look in the new location
- Add `docs/README.md` explaining the Diataxis-based restructuring plan and target audiences

## Context
This is phase 1 of a multi-phase documentation restructuring effort. The goal is to reorganize docs to follow the Diataxis framework while serving multiple audiences:
1. Erich (owner) - knowledge graph/zk
2. Claude/AI agents - memory and context enrichment
3. New external readers - high-level overview
4. Potential operators/contributors - onboarding
5. Replicators - people wanting to duplicate the approach

## Testing
- [x] Verified `mise run zk-docs` still works with the new path
- [x] Updated obsidian.nvim config (in ~/.config/nvim) to point to new path

## Note
The obsidian.nvim config change is outside this repo but was made as part of this work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/84
2026-02-03 09:13:50 -08:00
fd29244854 Simplify CI: remove Tailscale sidecar, use skopeo for push (#74)
## Summary
- Remove Tailscale sidecar from build-push-image action - registry.ops.eblu.me is directly reachable from k8s pods via Caddy
- Use skopeo for pushing images instead of docker push - Docker 27's manifest format has compatibility issues with zot registry
- Remove tailscale_authkey secret requirement from workflows

## Deployment and Testing
- [x] Tested with nettest-v0.10.0 tag - build succeeded and image pushed to registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/74
2026-01-30 10:18:20 -08:00
ea42362b6f Migrate Forgejo runner to Kubernetes with DinD (#60)
## Summary
- Deploy Forgejo runner to k8s with Docker-in-Docker sidecar
- Add job execution image with Node.js and Docker CLI
- Retire host-mode runner on indri
- All CI jobs now run containerized in k8s

## Components Added
- `containers/forgejo-runner/Dockerfile` - Job execution image
- `argocd/apps/forgejo-runner.yaml` - ArgoCD Application
- `argocd/manifests/forgejo-runner/` - Kubernetes manifests

## Components Removed
- `ansible/roles/forgejo_runner/` - No longer needed

## Changes to Existing Files
- `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env
- `.github/actionlint.yaml` - Only `k8s` label now valid

## Deployment
1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -`
2. Sync ArgoCD: `argocd app sync forgejo-runner`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60
2026-01-25 19:56:17 -08:00
424647cd93 Use Tailscale sidecar for container registry push
Some checks failed
Build Container / build (push) Failing after 1m9s
Docker Desktop's VM can't resolve tailnet hostnames. Work around this by:
1. Starting a Tailscale container that joins the tailnet
2. Building the image with docker build
3. Saving to tarball with docker save
4. Pushing via skopeo inside the Tailscale container

Uses TS_CI_GATEWAY_AUTHKEY repository secret for authentication.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:29:01 -08:00
31697b4d63 Add nettest container for CI/CD network debugging (#52)
Some checks failed
Build Container / build (push) Failing after 18s
## Summary
- Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script
- Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags
- Test script checks DNS resolution and HTTPS connectivity to forge and registry

## Deployment and Testing
- [ ] Merge PR to main
- [ ] Run `mise run container-release nettest v0.1.0` to trigger first build
- [ ] Verify workflow runs successfully and container can reach tailnet services
- [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52
2026-01-24 16:54:35 -08:00
8ca8798121 Switch to Buildah for container builds (#51)
All checks were successful
Test CI / test (push) Successful in 4s
## Summary
- Replace Docker with Buildah for container image builds
- No Docker socket required - buildah is daemonless
- Cleaner security model (no privileged containers or socket mounting)
- Remove Docker-related security context from deployment

## Changes
- Update Dockerfile to install buildah/podman instead of docker-cli
- Configure buildah storage with overlay driver and fuse-overlayfs
- Update composite action to use `buildah bud` and `buildah push`
- Add `imagePullPolicy: Always` to ensure fresh image pulls
- Update test workflow to verify buildah/podman

## Testing
- [ ] Runner pod starts successfully
- [ ] Buildah is available in runner
- [ ] Test workflow verifies buildah/podman versions
- [ ] Container build workflow builds and pushes to zot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51
2026-01-24 13:30:26 -08:00
5fcd122494 Reorganize CI/CD bootstrap phases and add custom runner Dockerfile (#50)
All checks were successful
Test CI / test (push) Successful in 2s
## Summary
- Reorder CI/CD bootstrap phases to address chicken-and-egg problem
- P2 is now "Custom Runner Image" (stock runner lacks Node.js)
- Add P3 for "Mirror Forgejo & Build from Source"
- Rename P3 -> P4 (Self-Deploy), P4 -> P5 (Container Builds)
- Add Dockerfile for custom runner with Node.js, npm, docker, build tools
- Update overview with new phase structure, host mode notes, and cross-compilation challenge

## Key Changes

### Phase Reordering
| Old | New | Name |
|-----|-----|------|
| P1 | P1 | Enable Actions (complete) |
| P2 | P2 | **Custom Runner Image** (new focus) |
| - | P3 | **Mirror Forgejo & Build** (new) |
| P3 | P4 | Self-Deploy |
| P4 | P5 | Container Builds |

### Custom Runner Dockerfile
The stock `forgejo/runner:3.5.1` image lacks Node.js, so `actions/checkout@v4` doesn't work. The new Dockerfile adds:
- Node.js + npm (for GitHub Actions)
- Docker CLI (for container builds)
- Build tools (gcc, make, curl, jq)

### Bootstrap Strategy
1. Build custom runner image manually on gilbert (podman build)
2. Push to zot registry
3. Update deployment to use custom image
4. Then enable auto-build workflow for runner

## Deployment and Testing
- [x] Review plan changes
- [x] Build custom runner image manually and verify
- [x] Update runner deployment
- [x] Test `actions/checkout@v4` works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/50
2026-01-23 18:50:27 -08:00
3bcad4189f Add actionlint pre-commit hook for workflow validation (#49)
All checks were successful
Test CI / test (push) Successful in 0s
## Summary
- Fix workflow to use `github.*` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.*` aliases)
- Pass untrusted inputs through environment variables (security best practice per actionlint)
- Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally

## Deployment and Testing
- [x] Pre-commit hooks all pass
- [x] actionlint validates `.forgejo/workflows/test.yaml` successfully
- [ ] Verify workflow runs without errors on Forge after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49
2026-01-23 17:56:24 -08:00
7893c41020 Enable Forgejo Actions (Phase 1) (#48)
All checks were successful
Test CI / test (push) Successful in 0s
## Summary
- Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password
- Enable Forgejo Actions in config (`[actions] ENABLED = true`)
- Add `repo.actions` to DEFAULT_REPO_UNITS
- Clean up unused MySQL database fields (we use SQLite)

## Phase 1 Progress
This PR covers the first part of Phase 1 (ci-cd-bootstrap plan):
- [x] Refactor app.ini to ansible template
- [x] Store secrets in 1Password
- [x] Enable Actions in config
- [ ] Deploy config changes (pending review)
- [ ] Create runner registration token
- [ ] Deploy runner to k8s
- [ ] Test with simple workflow

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags forgejo` to deploy
- [ ] Verify Forgejo restarts correctly
- [ ] Verify Actions tab appears in repo settings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48
2026-01-23 17:00:12 -08:00