Commit graph

159 commits

Author SHA1 Message Date
b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
faebc98c3c Fix blumeops-tasks for Todoist API v1 migration (#155)
## Summary
- Migrate from deprecated Todoist REST API v2 (`410 Gone`) to new unified API v1
- Add cursor-based pagination for project and task listing endpoints
- Switch 1Password credential retrieval from `op item get --fields` to `op read`

## Testing
- [x] `mise run blumeops-tasks` returns all 9 tasks successfully
- [x] Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/155
2026-02-11 14:33:37 -08:00
0dce806107 Add plan and reference card for UniFi Express 7 Pulumi stack (#145)
## Summary
- Rewrites the UniFi Pulumi plan doc to use filipowm/unifi Terraform provider via `pulumi package add terraform-provider` (replaces pulumiverse_unifi approach)
- Adds network segmentation goals (main/guest/IoT WiFi zones) and API key auth
- Creates UniFi reference card (`docs/reference/infrastructure/unifi.md`) with topology diagram
- Updates all documentation indexes (plans.md, how-to.md, hosts.md, reference.md)

## What's Deferred
Actual stack scaffolding (`pulumi/unifi/`), mise tasks, and `pulumi import` are blocked on switch purchase and cabling. The plan doc captures everything needed for a future execution session.

## Verification
- `docs-check-links` passes (all wiki-links resolve)
- `docs-check-index` passes (unifi.md referenced in reference.md)
- Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/145
2026-02-10 15:36:13 -08:00
54afa0750b Add how-to guide for restoring 1Password backup from borgmatic (#141)
## Summary
- New how-to guide at `docs/how-to/restore-1password-backup.md` with step-by-step procedure for extracting and decrypting a 1Password `.1pux` export from borgmatic backup
- **End-to-end verified**: extracted from today's borg archive, decrypted age key with openssl, decrypted .1pux with age → valid 31MB zip with vault data
- Cross-links added from: disaster-recovery, 1password, borgmatic, backups policy, and how-to index
- Updated disaster-recovery.md from TBD stub to include a procedures table

## Deployment and Testing
- [x] Verified full extraction + decryption flow against live borgmatic archive
- [x] `docs-check-links` passes — all wiki-links valid
- [ ] Review guide for clarity and completeness

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/141
2026-02-10 10:55:00 -08:00
a5765f9cf2 Add op-backup mise task for encrypted 1Password disaster recovery (#136)
## Summary
- Adds `mise run op-backup` task that encrypts a 1Password .1pux export with `age` using the master password + secret key as passphrase, SCPs to indri for borgmatic pickup, then deletes the plaintext
- Adds `age` to the Brewfile
- Borgmatic already backs up `/Users/erichblume/Documents` on indri, which covers the `1password-backup/` subdirectory — no config change needed

## Disaster recovery
1. Restore borgmatic archive to retrieve the `.age` file
2. Open Emergency Kit from safety deposit box
3. `age --decrypt <file>.age > export.1pux` (passphrase: `{master_password}:{secret_key}`)
4. Open `.1pux` with 1Password or unzip to inspect

## Usage
```
# Export all vaults from 1Password desktop app as .1pux, then:
mise run op-backup ~/Documents/1Password-export.1pux

# Or run without args for interactive prompt:
mise run op-backup
```

## Test plan
- [ ] `brew install age`
- [ ] Export a test vault from 1Password as .1pux
- [ ] Run `mise run op-backup` with the export path
- [ ] Verify encrypted file appears on indri at `~/Documents/1password-backup/`
- [ ] Verify plaintext .1pux is deleted from gilbert
- [ ] Test decryption: `age --decrypt <file>.age > test.1pux` with password:secret_key
- [ ] Verify decrypted .1pux can be opened/unzipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/136
2026-02-09 20:37:39 -08:00
85e36cd807 Operations and observability for sifaka NAS (#135)
## Summary
- Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring
- Formalize existing `node_exporter` container under Ansible management
- Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus
- Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime)
- Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS
- Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles

## Prerequisites before deploy
- [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP)
- [ ] Verify `ssh eblume@sifaka 'docker ps'` works
- [ ] Run `mise run provision-sifaka` to deploy containers
- [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes
- [ ] `argocd app sync prometheus` + `argocd app sync grafana-config`

## Test plan
- [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics`
- [ ] Verify Prometheus targets page shows both sifaka jobs as UP
- [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135
2026-02-09 17:44:05 -08:00
9e361cf38f Add docs-review task with last-reviewed frontmatter tracking (#129)
## Summary
- New `docs-review` mise task replaces `docs-review-random` — sorts docs by `last-reviewed` frontmatter field (never-reviewed first, then oldest)
- Updated review-documentation how-to to explain the new workflow and how to mark cards as reviewed
- Updated ai-assistance-guide task table to reference `docs-review`

## Test plan
- [x] `mise run docs-review` runs and shows staleness table + most stale doc
- [x] `mise run docs-review -- --limit 5` respects the limit flag
- [x] All pre-commit checks pass (links, index, filenames)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/129
2026-02-09 07:29:45 -08:00
a0b076172f Fix Immich/Homepage Ingress host matching, add missing service checks (#127)
## Summary

- Fix Immich Ingress `host: photos` causing 404 with ProxyGroup (same FQDN mismatch as Prometheus/Loki)
- Migrate Homepage from old per-service Tailscale proxy to shared ProxyGroup (was the last holdout)
- Add Immich and Navidrome to `services-check` HTTP endpoints

## Deployment Notes

- Already tested on branch: Immich and Homepage both return 200 via Caddy
- Homepage's old Helm-managed Ingress was deleted manually; ArgoCD may recreate it on sync — prune with `argocd app sync homepage --prune` after merge
- Old per-service `ts-homepage-*` pod in tailscale namespace can be cleaned up after confirming ProxyGroup works

## Test Plan

- [x] `curl https://photos.ops.eblu.me/` returns 200
- [x] `curl https://go.ops.eblu.me/` returns 200
- [ ] `mise run services-check` fully passes after merge

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/127
2026-02-08 22:12:50 -08:00
234c46c302 Filter blumeops-tasks to hide future-dated tasks (#124)
## Summary
- Tasks with a due date are now only shown when due today or earlier
- Recurring tasks stay hidden until their next occurrence is actionable
- Tasks without a due date continue to always display

## Test plan
- [x] Ran `mise run blumeops-tasks` — verified 18 undated tasks display correctly
- [x] Confirmed "BlumeOps doc review" (due tomorrow) is correctly hidden

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/124
2026-02-08 10:38:44 -08:00
61862b44c0 Add docs.eblu.me and Fly.io to services-check (#122)
## Summary

- Add public docs (`docs.eblu.me`) and Fly.io healthz checks to `mise run services-check`

## Test plan

- [ ] `mise run services-check` includes new "Public services (via Fly.io)" section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/122
2026-02-08 02:48:15 -08:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
fbedaf2833 docs/expose-service-publicly pt2 - fly.io (#119)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/119
2026-02-08 00:38:27 -08:00
a7d6d44d3d Remove title slug check and test duplicate titles (#116)
## Summary
- Remove `docs-check-titles` pre-commit hook and mise task — wiki-links resolve by filename stem, not frontmatter title, so slug-format titles and uniqueness aren't needed
- Add two test cards (`title-test-alpha`, `title-test-beta`) with identical `title: Title Test Card` to verify duplicate titles don't break Quartz or obsidian.nvim
- Retitle `index.md` from `blumeops-documentation` to `BlumeOps`
- Add GitHub and Forgejo repo links to homepage intro

## Test plan
- [ ] Deploy to docs site and verify both test cards render and cross-link correctly
- [ ] Verify homepage title renders as "BlumeOps"
- [ ] Verify repo links on homepage work
- [ ] After confirming, remove test cards in a follow-up

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/116
2026-02-07 21:26:18 -08:00
cb343a2e35 Rename doc-* mise tasks to docs-check-* / docs-review-* (#113)
## Summary
- Rename 4 automated-check tasks: `doc-titles` → `docs-check-titles`, `doc-filenames` → `docs-check-filenames`, `doc-links` → `docs-check-links`, `doc-index` → `docs-check-index`
- Rename 3 interactive-review tasks: `doc-random` → `docs-review-random`, `doc-tags` → `docs-review-tags`, `doc-stale` → `docs-review-stale`
- Update all references in `.pre-commit-config.yaml`, `ai-assistance-guide.md`, and `review-documentation.md`
- Historical changelog entries left as-is

## Test plan
- [x] `mise run docs-check-titles` exits 0
- [x] `mise run docs-check-links` exits 0
- [x] `mise run docs-review-tags` exits 0
- [x] `mise run doc-titles` fails with "no task found"
- [x] All pre-commit hooks pass (including renamed hook IDs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/113
2026-02-06 07:08:46 -08:00
060c7a24e3 Review exploring-the-docs and add doc consistency checks (#112)
## Summary
- Reviewed and cleaned up exploring-the-docs tutorial: simplified wiki-links, fixed broken replication/ reference, added Related section, corrected zk-docs flags to match CLAUDE.md
- Added orphan detection to doc-links (finds docs not linked from any other doc)
- Added new doc tooling: `doc-index` (checks category index coverage), `doc-stale` (staleness report), `doc-tags` (tag inventory)
- Added `doc-index` as a pre-commit hook
- Updated use-pypi-proxy to document env-var-based proxy toggle for pip/uv
- Updated ai-assistance-guide with new doc task descriptions

## Test plan
- [ ] Run `mise run doc-links` — passes
- [ ] Run `mise run doc-index` — passes
- [ ] Run `mise run doc-stale` — informational output
- [ ] Run `mise run doc-tags` — informational output
- [ ] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/112
2026-02-05 21:12:06 -08:00
ab1386b015 Fix zk-docs 2026-02-04 17:31:21 -08:00
3da455e49c Enforce unique doc filenames and simple wiki-links (#109)
## Summary
- Rename section index files to match their titles (tutorials.md, reference.md, how-to.md, explanation.md) so all filenames are unique
- Convert all ~47 path-based wiki-links to simple filename format across 15 files
- Update doc-filenames task to no longer skip index.md files
- Update doc-links task to reject path-based links containing '/'

This ensures all wiki-links work correctly in obsidian.nvim by making links resolvable by filename alone.

## Testing
- `mise run doc-filenames` - all unique
- `mise run doc-links` - no broken or path-based links
- `mise run doc-titles` - no duplicates

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/109
2026-02-04 17:21:34 -08:00
efdd569285 Improve build workflow with version bump selection and changelog in releases (#104)
## Summary

- Add `version_type` choice input with options: BUMP_PATCH (default), BUMP_MINOR, BUMP_MAJOR, SPECIFIC_VERSION
- Add optional `specific_version` input for explicit version selection
- Include changelog content in Forgejo release body under "What's Changed" section
- Move CHANGELOG.md to repository root (still copied into docs during Quartz build)
- Add CHANGELOG link to docs index page
- Update doc-links script to recognize build-time docs from repo root

## Changes

**Workflow inputs:**
- Previously: single optional `version` string input
- Now: `version_type` choice dropdown (defaults to BUMP_PATCH) + optional `specific_version` for explicit versions

**Release body:**
- Previously: just asset download instructions
- Now: includes "What's Changed" section with changelog entries for this release

**CHANGELOG.md location:**
- Previously: `docs/CHANGELOG.md`
- Now: `CHANGELOG.md` (repo root), copied into docs content during build

## Deployment and Testing

- [ ] Run build workflow with BUMP_PATCH (default)
- [ ] Run build workflow with BUMP_MINOR
- [ ] Verify changelog appears in release body
- [ ] Verify docs site includes CHANGELOG page

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/104
2026-02-04 08:13:16 -08:00
e720b524d3 Rename indri-services-check to services-check (#103)
## Summary
- Rename `indri-services-check` task to `services-check` since it checks all services (indri native, Kubernetes, HTTP endpoints), not just indri-specific ones
- Update references in CLAUDE.md, ai-assistance-guide.md, and troubleshooting.md

## Deployment and Testing
- [ ] Run `mise run services-check` to verify the task works under its new name

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/103
2026-02-04 07:49:15 -08:00
12c92de9e1 Add troubleshooting how-to to zk-docs (#99)
## Summary
- Add the troubleshooting how-to guide to the zk-docs script output
- Placed after how-to/index.md to group how-to content together

## Deployment and Testing
- [x] Verified script runs successfully with `mise run zk-docs`
- [x] Changelog fragment added

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/99
2026-02-04 06:44:23 -08:00
5c79a8dbe2 Add doc-random task and documentation improvements (#98)
## Summary
- Add `doc-random` mise task that selects a random documentation card for review
- Add how-to/knowledgebase section with review-documentation guide
- Add Caddy reference card with proxy configuration details
- Fix replication tutorial sequence (tailscale-setup now links to core-services)
- Fix "BluemeOps" typo in tailscale-setup
- Clean up obsolete zk/ directory references from doc-links

## Deployment and Testing
- [x] `mise run doc-random` works and displays a random card
- [x] `mise run doc-links` passes (all wiki-links valid)
- [x] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/98
2026-02-03 21:17:58 -08:00
f8f11121eb Complete Phase 6: documentation cleanup and integration (#97)
## Summary
- Delete `docs/zk/` directory - all useful content migrated to structured docs
- Delete `docs/README.md` - `docs/index.md` is now the documentation root
- Add `devpi` reference card and `use-pypi-proxy` how-to guide
- Add maintenance notes to `indri` reference (sleep prevention, passwordless sudo)
- Add iCloud Photos backup note to `borgmatic` reference
- Rewrite `zk-docs` mise task to prime AI context with key docs instead of legacy cards
- Update `CLAUDE.md` and `README.md` to remove zk references
- Update `exploring-the-docs` with AI context priming section

This completes the Diataxis documentation restructuring. All six phases are now done.

## Deployment and Testing
- [x] Pre-commit hooks pass (including doc-links validator)
- [ ] Build and deploy to docs.ops.eblu.me to verify rendering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/97
2026-02-03 20:52:37 -08:00
8e1cc4c638 Remove confirmation prompt from container-tag-and-release
All checks were successful
Build Container / build (push) Successful in 1m35s
Allows the task to run non-interactively.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:59:55 -08:00
2fef0c120d Fix reference/index wiki-link and support path-based links in doc-links
- Use [[reference/index | reference]] for ambiguous index.md link
- Update doc-links to validate both filenames and paths

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:38:28 -08:00
9630655cc9 Switch to filename-based wiki-links (Quartz resolves by filename)
- Convert all wiki-links from title-based to filename-based
- Update doc-links to validate against filenames
- Add doc-filenames task for duplicate filename detection
- Consolidate doc hooks into single local block in pre-commit config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:29:31 -08:00
cd2623760c Add unspaced pipe detection to doc-links validation
Wiki-links with display text must use spaces around the pipe:
- Good: [[target | display]]
- Bad: [[target|display]]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:20:12 -08:00
3e4b5c2dd3 Convert wiki-link titles to lowercase slugs (#92)
## Summary
- Convert all frontmatter titles to lowercase-hyphenated format (e.g., `grafana-alloy` instead of `Grafana Alloy`)
- Update all wiki-links to use the new slug format
- Update `doc-titles` task to validate slug format (lowercase, hyphens only)

Quartz appears to require titles without spaces for wiki-link resolution.

## Deployment and Testing
- [x] Pre-commit hooks pass (`doc-titles` and `doc-links`)
- [ ] Build docs v1.0.8 and deploy
- [ ] Verify wiki-links resolve correctly (e.g., `[[grafana-alloy]]`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/92
2026-02-03 16:06:35 -08:00
01adc4cf0f Switch to title-based wiki-links (#91)
## Summary
- Remove aliases from all zk cards to prevent them from capturing wiki-links
- Convert all wiki-links from `[[filename|Title]]` to `[[Title]]` format
- Replace `doc-filenames` task with `doc-titles` for duplicate title detection
- Update pre-commit hook to use `doc-titles`

Wiki-links now resolve to reference docs by their frontmatter title, which is more readable and maintainable than filename-based links.

## Deployment and Testing
- [x] Pre-commit hooks pass (including new `doc-titles` check)
- [x] Manually verified zk cards have aliases removed
- [ ] Deploy docs v1.0.7 and verify wiki-links resolve correctly
- [ ] Test links to reference docs (e.g., [[Grafana Alloy]], [[ArgoCD]])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/91
2026-02-03 15:55:31 -08:00
4ccb7b9a26 Fix wiki-links to use filename-based resolution (#90)
## Summary
- Quartz's "shortest" path mode resolves wiki-links by **filename**, not frontmatter title
- Previous PR used title-based links like `[[Grafana Alloy]]` which looked for non-existent `Grafana-Alloy.md`
- Now using filename-based links like `[[alloy|Grafana Alloy]]` which correctly resolve

## Changes
- Rename zk duplicate files with `-log` suffix (e.g., `argocd.md` → `argocd-log.md`)
- Rename `reference/storage/postgresql.md` to `postgresql-storage.md`
- Convert all 175 wiki-links from `[[Title]]` to `[[filename|Title]]` format
- Rename `doc-card-titles` task to `doc-filenames` (checks filename uniqueness, not titles)
- Update pre-commit hook for renamed task

## Deployment and Testing
- [x] Pre-commit hooks pass
- [x] `mise run doc-filenames` shows no duplicate filenames
- [ ] Verify wiki-links work correctly in Quartz build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/90
2026-02-03 15:30:42 -08:00
a45a78b478 Convert wiki-links to title-based format (#89)
## Summary
- Add `doc-card-titles` mise task to enumerate all doc cards by title/id and detect duplicates
- Remove redundant aliases from zk cards where alias matched the id
- Rename `reference/storage/postgresql.md` title to "PostgreSQL Storage" to avoid duplicate with `reference/services/postgresql.md`
- Convert all 175 path-based wiki-links `[[reference/path|Title]]` to title-based `[[Title]]` format
- Add pre-commit hook to check for duplicate card titles on doc changes

## Deployment and Testing
- [x] Pre-commit hooks pass
- [x] `mise run doc-card-titles` shows no duplicates
- [ ] Verify wiki-links work correctly in Quartz build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/89
2026-02-03 15:04:38 -08:00
b8104d75ad Move zk cards to docs/zk/ for documentation restructuring (#84)
## Summary
- Move all existing zettelkasten cards from `docs/` to `docs/zk/` as a temporary holding area
- Update `zk-docs` mise task to look in the new location
- Add `docs/README.md` explaining the Diataxis-based restructuring plan and target audiences

## Context
This is phase 1 of a multi-phase documentation restructuring effort. The goal is to reorganize docs to follow the Diataxis framework while serving multiple audiences:
1. Erich (owner) - knowledge graph/zk
2. Claude/AI agents - memory and context enrichment
3. New external readers - high-level overview
4. Potential operators/contributors - onboarding
5. Replicators - people wanting to duplicate the approach

## Testing
- [x] Verified `mise run zk-docs` still works with the new path
- [x] Updated obsidian.nvim config (in ~/.config/nvim) to point to new path

## Note
The obsidian.nvim config change is outside this repo but was made as part of this work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/84
2026-02-03 09:13:50 -08:00
c89e69e25f Add docs/ with blumeops zk cards (#82)
## Summary
- Move 21 blumeops-tagged zettelkasten cards from ~/code/personal/zk/ to docs/
- Create symlink ~/code/personal/zk/blumeops -> blumeops/docs for obsidian integration
- Update zk-docs mise task to read from local docs/ directory
- Add blumeops workspace to obsidian.nvim config (strict=true)

## Benefits
- Docs are now git-managed in the blumeops repo (visible on GitHub)
- Wiki links between blumeops docs continue to work via symlink
- obsidian-sync isolation: docs don't sync to work laptop
- Direct editing via obsidian.nvim with dedicated workspace

## Testing
- [x] Files moved to docs/ (21 files)
- [x] Symlink created: ~/code/personal/zk/blumeops -> blumeops/docs
- [x] zk-docs mise task updated and working
- [ ] Verify obsidian.nvim link resolution (after merge)
- [ ] Verify obsidian backlinks work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/82
2026-02-02 21:40:53 -08:00
b8b33b76c8 Remove Plex media server (#78)
## Summary
- Remove plex_metrics ansible role
- Remove Plex Grafana dashboard
- Remove Plex log collection from Alloy config
- Update indri-services-check to check Jellyfin instead of Plex

## Deployment and Testing
- [x] Unloaded plex-metrics LaunchAgent on indri
- [x] Deleted plex-metrics plist and script
- [x] Deleted plex.prom textfile
- [ ] Deploy Alloy config update
- [ ] Sync grafana-config to remove dashboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/78
2026-01-30 17:06:00 -08:00
66badfafd1 Migrate k8s services to Caddy (*.ops.eblu.me) (#59)
All checks were successful
Build Container / build (push) Successful in 13s
## Summary
- Add Caddy reverse proxy routes for all k8s services (grafana, argocd, prometheus, loki, miniflux, devpi, kiwix, torrent, teslamate)
- Add PostgreSQL via Caddy L4 TCP proxy on port 5432
- Caddy proxies to existing Tailscale endpoints - traffic stays local on indri
- Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs continue to work

## Updated References
- Alloy: prometheus/loki push endpoints → `*.ops.eblu.me`
- Borgmatic: PostgreSQL backup host → `pg.ops.eblu.me`
- Devpi: DEVPI_OUTSIDE_URL → `pypi.ops.eblu.me`
- indri-services-check: health check URLs
- CLAUDE.md: argocd login command

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy new Caddy config
- [ ] Test HTTP services: `curl https://grafana.ops.eblu.me/api/health`
- [ ] Test PostgreSQL: `pg_isready -h pg.ops.eblu.me -p 5432`
- [ ] Run `mise run provision-indri -- --tags alloy` to update Alloy endpoints
- [ ] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic
- [ ] Sync devpi in ArgoCD: `argocd app sync devpi`
- [ ] Re-login to ArgoCD: `argocd login argocd.ops.eblu.me ...`
- [ ] Run `mise run indri-services-check` to verify all services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/59
2026-01-25 12:56:31 -08:00
d6e6b48f6a Migrate registry to Caddy (registry.ops.eblu.me) (#58)
## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
2026-01-25 12:06:15 -08:00
1184b4de1d Add Caddy layer4 for Forgejo SSH (#56)
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin

## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube

This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.

## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints

## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.

## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56
2026-01-25 11:37:23 -08:00
b08faa50cc Add Gandi DNS management via Pulumi (#54)
## Summary
- Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/`
- Add Gandi LiveDNS management for `eblu.me` domain
- Create wildcard DNS record `*.ops.eblu.me` → indri's Tailscale IP (100.98.163.89)
- Add mise tasks: `dns-up`, `dns-preview`
- Update `tailnet-up` to pass `--yes` by default
- Document PAT cycling process (expires every 30 days)

## Background
This enables using real DNS names (`*.ops.eblu.me`) that resolve to Tailscale IPs,
which allows containers and other systems to resolve services without depending on
MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain
tailnet-only while using standard DNS.

## Deployment and Testing
- [ ] Run `cd pulumi/gandi && uv sync` to install dependencies
- [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack
- [ ] Run `mise run dns-preview` to verify configuration
- [ ] Run `mise run dns-up` to apply DNS records
- [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54
2026-01-25 08:15:46 -08:00
b598ea8d5a Add indri-runner-logs script to fetch workflow logs
Fetches logs for Forgejo Actions runs from indri's local storage.
Logs are stored as zstd-compressed files in the forgejo data directory.

Usage: mise run indri-runner-logs <run_id>

Only works for runs executed by the indri-host-runner.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:25:32 -08:00
31697b4d63 Add nettest container for CI/CD network debugging (#52)
Some checks failed
Build Container / build (push) Failing after 18s
## Summary
- Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script
- Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags
- Test script checks DNS resolution and HTTPS connectivity to forge and registry

## Deployment and Testing
- [ ] Merge PR to main
- [ ] Run `mise run container-release nettest v0.1.0` to trigger first build
- [ ] Verify workflow runs successfully and container can reach tailnet services
- [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52
2026-01-24 16:54:35 -08:00
8ca8798121 Switch to Buildah for container builds (#51)
All checks were successful
Test CI / test (push) Successful in 4s
## Summary
- Replace Docker with Buildah for container image builds
- No Docker socket required - buildah is daemonless
- Cleaner security model (no privileged containers or socket mounting)
- Remove Docker-related security context from deployment

## Changes
- Update Dockerfile to install buildah/podman instead of docker-cli
- Configure buildah storage with overlay driver and fuse-overlayfs
- Update composite action to use `buildah bud` and `buildah push`
- Add `imagePullPolicy: Always` to ensure fresh image pulls
- Update test workflow to verify buildah/podman

## Testing
- [ ] Runner pod starts successfully
- [ ] Buildah is available in runner
- [ ] Test workflow verifies buildah/podman versions
- [ ] Container build workflow builds and pushes to zot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51
2026-01-24 13:30:26 -08:00
25fa2ea665 Update indri-services-check 2026-01-22 21:31:11 -08:00
17023085cb Migrate observability stack to Kubernetes (#42)
Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.

Summary

  - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
  - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
  - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
  - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
  - Add ACL rule for port 9187 (CNPG metrics)
  - Delete obsolete ansible roles for prometheus and loki

Changes

  - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
  - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
  - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
  - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
  - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
  - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
  - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
  - pulumi/policy.hujson - ACL for port 9187
  - Deleted ansible/roles/prometheus/ and ansible/roles/loki/

Deployment and Testing

  - Stop prometheus and loki on indri
  - Sync ArgoCD apps (apps, prometheus, loki, grafana)
  - Run mise run provision-indri -- --tags alloy
  - Verify Grafana dashboards show data

🤖 Generated with https://claude.ai/claude-code

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
2026-01-22 12:06:02 -08:00
2e7ca8a5ff Add mise task to list unresolved PR comments (#40)
## Summary
- New `pr-comments` mise task queries Forge API for unresolved review comments on a PR
- Task takes a PR number as argument and displays all comments without a resolver
- Updated CLAUDE.md to include using this task after user reviews PRs

## Deployment and Testing
- [x] Tested task on PR #39 (shows no unresolved comments since all were resolved)
- [x] Tested error handling with non-existent PR #9999

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/40
2026-01-21 19:14:27 -08:00
21848a7919 P5.1: Migrate minikube from podman to QEMU2 driver (#38)
## Summary
- Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support
- Update ansible minikube role with qemu installation and containerd runtime
- Remove podman role dependency from indri.yml
- Add synology user creation steps and post-migration zot reconfiguration notes

## Why
Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support.

## Deployment and Testing
- [ ] Create k8s-storage user on Synology DSM
- [ ] Store credentials in 1Password (synology-k8s-storage)
- [ ] Export current k8s state
- [ ] Stop and delete podman-based minikube cluster
- [ ] Run ansible to create QEMU2 cluster
- [ ] Test NFS volume mount with test pod
- [ ] Redeploy ArgoCD and all apps
- [ ] Verify all services healthy
- [ ] Reconfigure zot registry mirrors for containerd (post-migration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38
2026-01-21 16:03:37 -08:00
735b643429 P4: Miniflux migration + PostgreSQL consolidation (#33)
## Summary
- Deploy miniflux in k8s via ArgoCD
- Expose via Tailscale Ingress at feed.tail8d86e.ts.net
- Retire brew PostgreSQL (no longer needed)
- Rename k8s-pg to pg (canonical hostname)
- Remove ansible miniflux and postgresql roles
- Update borgmatic to backup pg.tail8d86e.ts.net
- Update all zk documentation

## Deployment and Testing
- [x] Miniflux pod running in k8s
- [x] User login works at https://feed.tail8d86e.ts.net
- [x] Feeds and entries visible
- [x] brew miniflux and postgresql stopped
- [x] Tailscale services migrated (feed, pg)
- [x] zk documentation updated
- [x] Run ansible to apply role removals
- [ ] Verify borgmatic backup with new pg hostname

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33
2026-01-20 09:04:47 -08:00
a8f4d00294 K8s Migration Phase 1: Infrastructure Setup (#29)
## Summary
- Split k8s migration plan into phases folder for easier navigation
- Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads
- Phase 1 work in progress

## Phase 1 Goals
- Tailscale Kubernetes Operator
- CloudNativePG Operator
- PostgreSQL cluster for future app migrations

## Deployment and Testing
- [ ] Review Phase 1 plan
- [ ] `mise run tailnet-preview` to verify ACL changes
- [ ] `mise run tailnet-up` to apply ACL changes
- [ ] Create Tailscale OAuth client (manual)
- [ ] Deploy operators and PostgreSQL cluster

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29
2026-01-19 09:49:52 -08:00
19a82373d5 K8s Migration Phase 0: Foundation Infrastructure (#26)
## Summary
- Step 0.1: Update Pulumi ACLs with tag:registry
- Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent
- Step 0.4: Add Zot to Tailscale Serve configuration
- Step 0.5: Create Zot metrics role for Prometheus scraping
- Step 0.6: Add Zot log collection to Alloy
- Step 0.7: Update indri-services-check with zot checks
- Step 0.8: Add podman role for container runtime
- Step 0.9: Add minikube role for Kubernetes cluster
- Step 0.10: Configure remote kubectl access with 1Password credentials

## Remaining Steps
- [ ] Step 0.11: Add minikube to indri-services-check
- [ ] Step 0.12: Create zettelkasten documentation
- [ ] Step 0.13: Verify main playbook (already done - roles added)

## Deployment and Testing
- [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net
- [x] Podman machine running on indri
- [x] Minikube cluster running on indri
- [x] kubectl access from gilbert working with 1Password credentials
- [ ] indri-services-check passes all checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/26
2026-01-18 12:06:28 -08:00
9931829d03 Add pre-commit hooks for code quality (#19)
## Summary
- Add pre-commit framework with hooks for YAML, Ansible, Python, shell, TOML, JSON, and secret detection
- Fix all 91+ ansible-lint violations (variable naming, handler capitalization, changed_when)
- Fix shellcheck warnings in mise-tasks scripts
- Document pre-commit setup in README.md

## Deployment and Testing
- [x] All pre-commit hooks pass (`uvx pre-commit run --all-files`)
- [x] Test ansible playbook with `--check` mode
- [x] Run `mise run indri-services-check` after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/19
2026-01-16 19:33:02 -08:00
812b78bf61 Use explicit PostgreSQL superuser name and fix check mode (#17)
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check

## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
2026-01-16 14:41:36 -08:00