Commit graph

274 commits

Author SHA1 Message Date
a5765f9cf2 Add op-backup mise task for encrypted 1Password disaster recovery (#136)
## Summary
- Adds `mise run op-backup` task that encrypts a 1Password .1pux export with `age` using the master password + secret key as passphrase, SCPs to indri for borgmatic pickup, then deletes the plaintext
- Adds `age` to the Brewfile
- Borgmatic already backs up `/Users/erichblume/Documents` on indri, which covers the `1password-backup/` subdirectory — no config change needed

## Disaster recovery
1. Restore borgmatic archive to retrieve the `.age` file
2. Open Emergency Kit from safety deposit box
3. `age --decrypt <file>.age > export.1pux` (passphrase: `{master_password}:{secret_key}`)
4. Open `.1pux` with 1Password or unzip to inspect

## Usage
```
# Export all vaults from 1Password desktop app as .1pux, then:
mise run op-backup ~/Documents/1Password-export.1pux

# Or run without args for interactive prompt:
mise run op-backup
```

## Test plan
- [ ] `brew install age`
- [ ] Export a test vault from 1Password as .1pux
- [ ] Run `mise run op-backup` with the export path
- [ ] Verify encrypted file appears on indri at `~/Documents/1password-backup/`
- [ ] Verify plaintext .1pux is deleted from gilbert
- [ ] Test decryption: `age --decrypt <file>.age > test.1pux` with password:secret_key
- [ ] Verify decrypted .1pux can be opened/unzipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/136
2026-02-09 20:37:39 -08:00
85e36cd807 Operations and observability for sifaka NAS (#135)
## Summary
- Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring
- Formalize existing `node_exporter` container under Ansible management
- Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus
- Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime)
- Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS
- Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles

## Prerequisites before deploy
- [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP)
- [ ] Verify `ssh eblume@sifaka 'docker ps'` works
- [ ] Run `mise run provision-sifaka` to deploy containers
- [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes
- [ ] `argocd app sync prometheus` + `argocd app sync grafana-config`

## Test plan
- [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics`
- [ ] Verify Prometheus targets page shows both sifaka jobs as UP
- [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135
2026-02-09 17:44:05 -08:00
4ee643a81d Serve friendly error page when Fly.io proxy upstreams are unreachable (#133)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m50s
## Summary
- Adds a branded 503 error page served when upstreams are unreachable (indri offline, Tailscale tunnel down, emergency shutoff, etc.)
- Stale cache is still served first when available (`proxy_cache_use_stale` takes priority)
- Test endpoint at `docs.eblu.me/_error` to preview the page without killing upstreams
- `proxy_intercept_errors on` also catches error responses returned by the upstream itself

## Files Changed
- `fly/error.html` — Self-contained error page (dark theme, links to BlumeOps repo)
- `fly/nginx.conf` — `error_page`, `internal` location, `/_error` test location, `proxy_intercept_errors`
- `fly/Dockerfile` — COPY error.html into image

## Test Plan
- [ ] Deploy to Fly.io
- [ ] Visit `docs.eblu.me/_error` to verify the page renders
- [ ] Optionally stop indri/Tailscale to confirm the page shows on real 502/503/504

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/133
2026-02-09 12:01:24 -08:00
959b6842bc Zero-downtime Fly.io deploys (#132)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m40s
## Summary
- Start nginx after Tailscale connects (community best practice for Tailscale sidecars)
- Switch to `bluegreen` deploy strategy — old machine serves until new one is healthy
- Replace top-level `[checks]` with `[[http_service.checks]]` — only service-level checks gate traffic routing ([confirmed by Fly.io staff](https://community.fly.io/t/clarifying-the-types-of-health-checks/20379))
- Remove sentinel file and nginx if-check (no longer needed)

Supersedes the approach in #131 — that helped (502 window dropped from ~30s to ~3s) but couldn't fully eliminate it because top-level checks don't gate routing and Fly.io's proxy sends traffic as soon as the port is reachable.

## Deployment and Testing
- [ ] Merge and `fly deploy` from `fly/` directory
- [ ] Verify deploy completes with zero 502s (watch `fly logs` and Grafana docs-apm)
- [ ] Confirm `fly checks list` shows the new service-level check passing

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/132
2026-02-09 11:34:19 -08:00
bd61da4f85 Fix 502 errors during Fly.io proxy deploys (#131)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m20s
## Summary
- Health check (`/healthz`) now returns 503 until Tailscale is connected
- `start.sh` creates `/tmp/tailscale-ready` sentinel after `tailscale up` succeeds
- Fly.io keeps the old machine serving traffic during the ~7s startup window

Previously, nginx passed the health check immediately, Fly.io routed traffic to the new machine, but MagicDNS wasn't available yet — causing upstream DNS timeouts and 502s on every request until Tailscale connected.

## Deployment and Testing
- [ ] Merge and `fly deploy` from `fly/` directory
- [ ] Verify deploy completes with zero 502s (check Grafana docs-apm dashboard)
- [ ] Confirm health check transitions from 503 → 200 in `fly logs`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/131
2026-02-09 11:07:36 -08:00
3415cad38c Log real client IPs via Fly-Client-IP header (#130)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 59s
## Summary
- Add `client_ip` field to the Fly.io nginx JSON log format, sourced from `Fly-Client-IP` header
- Extract `client_ip` in the Alloy pipeline so it's available as a parsed field in Loki
- Keeps `remote_addr` (the internal proxy IP) for debugging

Fixes: Grafana access logs for docs.eblu.me showing 172.16.11.178 for every request instead of real visitor IPs.

## Deployment and Testing
- [ ] Deploy updated fly.io proxy: `fly deploy` from `fly/` directory
- [ ] Verify in Grafana that new log lines include `client_ip` with real IPs
- [ ] Confirm `remote_addr` still shows the proxy IP (preserved for debugging)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/130
2026-02-09 11:02:06 -08:00
Forgejo Actions
92a1081302 Update docs release to v1.5.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-09 15:30:21 +00:00
9e361cf38f Add docs-review task with last-reviewed frontmatter tracking (#129) v1.5.2
## Summary
- New `docs-review` mise task replaces `docs-review-random` — sorts docs by `last-reviewed` frontmatter field (never-reviewed first, then oldest)
- Updated review-documentation how-to to explain the new workflow and how to mark cards as reviewed
- Updated ai-assistance-guide task table to reference `docs-review`

## Test plan
- [x] `mise run docs-review` runs and shows staleness table + most stale doc
- [x] `mise run docs-review -- --limit 5` respects the limit flag
- [x] All pre-commit checks pass (links, index, filenames)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/129
2026-02-09 07:29:45 -08:00
c6f8fcd346 Fix fly-deploy WARNING by starting nginx before Tailscale (#128)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m3s
## Summary
- Start nginx before Tailscale in `start.sh` so port 8080 is bound immediately, eliminating the "app is not listening on the expected address" WARNING during `fly deploy`
- Switch `proxy_pass` to use a variable with `resolver 100.100.100.100 valid=30s` so nginx can start without resolving MagicDNS names at config load time
- DNS results cached 30s per worker — no per-request lookup overhead

## Context
The WARNING was a race condition: Fly checks for listeners right after the machine starts, but `start.sh` ran ~5-10s of Tailscale setup before starting nginx. The health check always passed later, but the warning was noisy.

## Test plan
- [ ] Merge and let the deploy-fly workflow trigger
- [ ] Check runner logs for absence of the WARNING
- [ ] Verify `docs.eblu.me` still serves correctly
- [ ] Verify `/healthz` still passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/128
2026-02-09 07:01:58 -08:00
a0b076172f Fix Immich/Homepage Ingress host matching, add missing service checks (#127)
## Summary

- Fix Immich Ingress `host: photos` causing 404 with ProxyGroup (same FQDN mismatch as Prometheus/Loki)
- Migrate Homepage from old per-service Tailscale proxy to shared ProxyGroup (was the last holdout)
- Add Immich and Navidrome to `services-check` HTTP endpoints

## Deployment Notes

- Already tested on branch: Immich and Homepage both return 200 via Caddy
- Homepage's old Helm-managed Ingress was deleted manually; ArgoCD may recreate it on sync — prune with `argocd app sync homepage --prune` after merge
- Old per-service `ts-homepage-*` pod in tailscale namespace can be cleaned up after confirming ProxyGroup works

## Test Plan

- [x] `curl https://photos.ops.eblu.me/` returns 200
- [x] `curl https://go.ops.eblu.me/` returns 200
- [ ] `mise run services-check` fully passes after merge

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/127
2026-02-08 22:12:50 -08:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
7f41621c7f Migrate Ansible op calls to op read URI syntax (#125)
## Summary
- Convert all 12 `op item get ... --fields ... --reveal` calls in Ansible to the newer `op read "op://vault/item/field"` syntax
- Remove the `regex_replace` workaround on the Fly deploy token (no longer needed since `op read` returns clean unquoted values)
- Covers `ansible/playbooks/indri.yml`, `ansible/roles/caddy/tasks/main.yml`, `ansible/roles/jellyfin_metrics/tasks/main.yml`, and `ansible/roles/alloy/tasks/main.yml`

## Test plan
- [x] `mise run provision-indri -- --check --diff` dry run passes (ok=67, failed=0)
- [x] No `op item get` calls remain in `ansible/` directory
- [x] All pre-commit hooks pass (yaml, ansible-lint, TruffleHog, etc.)
- [ ] Full provision run after merge to confirm secrets resolve correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/125
2026-02-08 10:52:43 -08:00
234c46c302 Filter blumeops-tasks to hide future-dated tasks (#124)
## Summary
- Tasks with a due date are now only shown when due today or earlier
- Recurring tasks stay hidden until their next occurrence is actionable
- Tasks without a due date continue to always display

## Test plan
- [x] Ran `mise run blumeops-tasks` — verified 18 undated tasks display correctly
- [x] Confirmed "BlumeOps doc review" (due tomorrow) is correctly hidden

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/124
2026-02-08 10:38:44 -08:00
Forgejo Actions
c8d0af6644 Update docs release to v1.5.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 18:06:46 +00:00
cc54b4f565 Add Fly.io proxy observability via embedded Alloy (#123)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m16s
v1.5.1
## Summary

- Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus)
- Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting)
- Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services)

## Changed Files

| File | Change |
|------|--------|
| `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint |
| `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config |
| `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write |
| `fly/start.sh` | Start Alloy after Tailscale, before nginx |
| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard |
| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard |
| `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps |
| `docs/reference/services/flyio-proxy.md` | Document observability setup |

## Deployment and Testing

- [ ] `mise run fly-deploy` — rebuild container with Alloy
- [ ] `curl https://docs.eblu.me/` — generate traffic
- [ ] `fly logs -a blumeops-proxy` — verify Alloy startup
- [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}`
- [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}`
- [ ] `argocd app sync grafana-config` — deploy dashboards
- [ ] Verify dashboards show data in Grafana
- [ ] `mise run services-check` — no regressions

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
61862b44c0 Add docs.eblu.me and Fly.io to services-check (#122)
## Summary

- Add public docs (`docs.eblu.me`) and Fly.io healthz checks to `mise run services-check`

## Test plan

- [ ] `mise run services-check` includes new "Public services (via Fly.io)" section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/122
2026-02-08 02:48:15 -08:00
6b1167a345 Fix Fly.io deploy token quoting (#121)
## Summary

- Strip literal quotes from Fly.io deploy token when syncing to Forgejo Actions secrets
- The `op` CLI wraps values containing spaces in quotes; the Fly.io token format is `FlyV1 <key>` (contains a space)
- This caused the CI deploy workflow to fail with a 401 auth error

## Test plan

- [x] `mise run provision-indri -- --tags forgejo_actions_secrets` succeeds
- [ ] Re-trigger deploy-fly workflow after merge — should authenticate successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/121
2026-02-08 02:43:45 -08:00
Forgejo Actions
c46d55060d Update docs release to v1.5.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 10:37:30 +00:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
v1.5.0
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
fbedaf2833 docs/expose-service-publicly pt2 - fly.io (#119)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/119
2026-02-08 00:38:27 -08:00
ae2445c99a Add how-to guide for public service exposure via Cloudflare (#118)
## Summary
- Adds `docs/how-to/expose-service-publicly.md` documenting the full plan for exposing `docs.eblu.me` to the public internet
- Covers Cloudflare Tunnel + CDN architecture, DNS migration from Gandi, Caddy TLS changes, Pulumi IaC, k8s cloudflared deployment, and verification steps
- Pattern is reusable for future public services
- Marked as "Plan — not yet implemented" status

## Test plan
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] All pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/118
2026-02-07 22:09:52 -08:00
Forgejo Actions
11c76d4768 Update docs release to v1.4.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:45:40 +00:00
dc46eb7def Update all docs titles to human-readable (#117) v1.4.2
## Summary
- Updated frontmatter `title:` in all 63 doc cards from slug-case to human-readable (e.g. `borgmatic` → `Borgmatic`, `ai-assistance-guide` → `AI Assistance Guide`)
- Titles now closely match file stems so `[[wiki-links]]` render naturally without alternate anchor text
- Corrected titles that diverged from stems (e.g. `host-inventory` → `Hosts`, `grafana-alloy` → `Alloy`, `argocd-applications` → `Apps`)
- Deleted `title-test-alpha.md` and `title-test-beta.md` test cards and removed their reference index entry

## Deployment and Testing
- [x] `docs-check-links` passes — all wiki-links valid
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes
- [ ] Verify titles render correctly on docs site after deploy

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/117
2026-02-07 21:44:57 -08:00
Forgejo Actions
ab7efd8c1c Update docs release to v1.4.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:27:23 +00:00
a7d6d44d3d Remove title slug check and test duplicate titles (#116) v1.4.1
## Summary
- Remove `docs-check-titles` pre-commit hook and mise task — wiki-links resolve by filename stem, not frontmatter title, so slug-format titles and uniqueness aren't needed
- Add two test cards (`title-test-alpha`, `title-test-beta`) with identical `title: Title Test Card` to verify duplicate titles don't break Quartz or obsidian.nvim
- Retitle `index.md` from `blumeops-documentation` to `BlumeOps`
- Add GitHub and Forgejo repo links to homepage intro

## Test plan
- [ ] Deploy to docs site and verify both test cards render and cross-link correctly
- [ ] Verify homepage title renders as "BlumeOps"
- [ ] Verify repo links on homepage work
- [ ] After confirming, remove test cards in a follow-up

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/116
2026-02-07 21:26:18 -08:00
Forgejo Actions
3f5017f732 Update docs release to v1.4.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:03:34 +00:00
8e4afe77e0 Add Gandi DNS docs and rewrite homepage intro (#115) v1.4.0
## Summary
- New reference card (`docs/reference/infrastructure/gandi.md`) covering DNS records, Pulumi config, TLS integration
- New how-to guide (`docs/how-to/gandi-operations.md`) for DNS deployment and PAT cycling with `pbpaste` shortcut
- Rewritten homepage intro for wider audience ahead of public docs.eblu.me
- Cross-linked from reference index, routing, caddy, and how-to index
- Fixed PAT expiration inaccuracy in `pulumi/gandi/README.md` (max is 90 days, not 30)

## Test plan
- [ ] Verify wiki-links resolve in Quartz build
- [ ] Review gandi reference card for accuracy
- [ ] Review gandi-operations how-to for accuracy
- [ ] Check homepage reads well for external visitors

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/115
2026-02-07 21:02:10 -08:00
2cb76de5a2 Fix doc tag inconsistencies and add missing ai changelog type (#114)
## Summary
- Add missing `ai` changelog fragment type to the update-documentation how-to guide (comment and table)
- Consolidate `cicd` → `ci-cd` tag on forgejo.md
- Consolidate `network` → `networking` tag on routing.md and tailscale.md

Found during random doc review via `docs-review-tags`.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/114
2026-02-06 18:52:36 -08:00
cb343a2e35 Rename doc-* mise tasks to docs-check-* / docs-review-* (#113)
## Summary
- Rename 4 automated-check tasks: `doc-titles` → `docs-check-titles`, `doc-filenames` → `docs-check-filenames`, `doc-links` → `docs-check-links`, `doc-index` → `docs-check-index`
- Rename 3 interactive-review tasks: `doc-random` → `docs-review-random`, `doc-tags` → `docs-review-tags`, `doc-stale` → `docs-review-stale`
- Update all references in `.pre-commit-config.yaml`, `ai-assistance-guide.md`, and `review-documentation.md`
- Historical changelog entries left as-is

## Test plan
- [x] `mise run docs-check-titles` exits 0
- [x] `mise run docs-check-links` exits 0
- [x] `mise run docs-review-tags` exits 0
- [x] `mise run doc-titles` fails with "no task found"
- [x] All pre-commit hooks pass (including renamed hook IDs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/113
2026-02-06 07:08:46 -08:00
060c7a24e3 Review exploring-the-docs and add doc consistency checks (#112)
## Summary
- Reviewed and cleaned up exploring-the-docs tutorial: simplified wiki-links, fixed broken replication/ reference, added Related section, corrected zk-docs flags to match CLAUDE.md
- Added orphan detection to doc-links (finds docs not linked from any other doc)
- Added new doc tooling: `doc-index` (checks category index coverage), `doc-stale` (staleness report), `doc-tags` (tag inventory)
- Added `doc-index` as a pre-commit hook
- Updated use-pypi-proxy to document env-var-based proxy toggle for pip/uv
- Updated ai-assistance-guide with new doc task descriptions

## Test plan
- [ ] Run `mise run doc-links` — passes
- [ ] Run `mise run doc-index` — passes
- [ ] Run `mise run doc-stale` — informational output
- [ ] Run `mise run doc-tags` — informational output
- [ ] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/112
2026-02-05 21:12:06 -08:00
61c5328ec2 Update restart-indri docs after power outage recovery (#111)
## Summary
- Simplified restart-indri startup procedure to match reality (most services autostart via mcquack LaunchAgents and brew services)
- Added minikube tailscale serve port fix step (`mise run provision-indri -- --tags minikube`)
- Added Anker SOLIX F2000 GaNPrime UPS to indri reference card

## Context
After a power outage, discovered that the restart-indri docs overstated what needs manual intervention. Docker Desktop, Forgejo, Caddy, and all mcquack services autostart. Only Amphetamine, AutoMounter, and minikube need manual action.

## Test plan
- [ ] Verify restart-indri doc reads clearly
- [ ] Verify indri ref card UPS entry renders correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/111
2026-02-05 21:05:35 -08:00
3b4ff91469 Fix homepage Admin bookmark icons (#110)
## Summary
- Fix broken Pulumi icon: changed `pulumi` to `si-pulumi` (Simple Icons prefix required)
- Fix broken ArgoCD icon: changed `argocd` to `argo-cd` (Dashboard Icons uses hyphenated name)

## Deployment and Testing
- [ ] Sync homepage app in ArgoCD
- [ ] Verify icons appear on go.ops.eblu.me Admin bookmarks section

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/110
2026-02-05 06:29:39 -08:00
ab1386b015 Fix zk-docs 2026-02-04 17:31:21 -08:00
Forgejo Actions
808bc507d8 Update docs release to v1.3.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-05 01:22:10 +00:00
3da455e49c Enforce unique doc filenames and simple wiki-links (#109) v1.3.4
## Summary
- Rename section index files to match their titles (tutorials.md, reference.md, how-to.md, explanation.md) so all filenames are unique
- Convert all ~47 path-based wiki-links to simple filename format across 15 files
- Update doc-filenames task to no longer skip index.md files
- Update doc-links task to reject path-based links containing '/'

This ensures all wiki-links work correctly in obsidian.nvim by making links resolvable by filename alone.

## Testing
- `mise run doc-filenames` - all unique
- `mise run doc-links` - no broken or path-based links
- `mise run doc-titles` - no duplicates

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/109
2026-02-04 17:21:34 -08:00
Forgejo Actions
a03a9faaad Update docs release to v1.3.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 22:40:18 +00:00
7aa0e60b27 Add how-to guide for restarting indri (#108) v1.3.3
## Summary
- Add `docs/how-to/restart-indri.md` with safe shutdown and startup procedures
- Add `docs/reference/services/automounter.md` documenting the SMB share automounter app
- Update indri reference card with GUI applications section
- Update how-to and reference indexes

## Test plan
- [ ] Review restart-indri.md for accuracy
- [ ] Verify AutoMounter details are correct
- [ ] Perform the actual restart using this guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/108
2026-02-04 14:39:48 -08:00
74bd5abe54 Add IaC for Forgejo Actions secrets via Ansible (#107)
## Summary
- New `forgejo_actions_secrets` Ansible role syncs repository-level Actions secrets from 1Password to Forgejo via the Forgejo API
- Replaces manual process of copying secrets from 1Password to Forgejo UI
- Documents the one-time PAT setup requirement in forgejo.md

## Manual Setup Required
Before this role can run, a Forgejo PAT must be created:
1. Go to https://forge.ops.eblu.me/user/settings/applications
2. Create a new token with `write:repository` scope
3. Store it in 1Password → "Forgejo Secrets" item → `api-token` field

This has already been done.

## Test Plan
- [x] Ran `mise run provision-indri -- --tags forgejo_actions_secrets` successfully
- [x] Verified secret synced (API returned 204 = updated existing)
- [x] Ansible-lint passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/107
2026-02-04 09:11:01 -08:00
Forgejo Actions
e15caec898 Update docs release to v1.3.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:47:27 +00:00
95610d8e54 Fix Quartz build to preserve git history for accurate file dates (#106) v1.3.2
## Summary

Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings by using Quartz's `-d docs` flag instead of symlinking.

## Problem

The previous approach symlinked `content -> docs`, but git doesn't follow symlinks. When Quartz asked git about `content/index.md`, git had no history for that path.

## Solution

Use `npx quartz build -d docs` to tell Quartz to read from `docs/` directly. Now when Quartz asks git about `docs/index.md`, git finds the actual file history.

- CHANGELOG.md is copied (not symlinked) into `docs/` for the build, then removed
- All other files have accurate git-based dates

## Testing

Tested locally - build produces no warnings.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/106
2026-02-04 08:46:47 -08:00
Forgejo Actions
4aeade1543 Update docs release to v1.3.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:26:24 +00:00
03bda41de4 Fix Quartz build to preserve git history for accurate file dates (#105) v1.3.1
## Summary

Fixes the "isn't yet tracked by git, dates will be inaccurate" warnings in the Build docs step by restructuring how Quartz builds the documentation.

## Problem

Previously, we copied docs into Quartz's content folder. Since this was inside a fresh Quartz clone with no history of our files, the `CreatedModifiedDate` plugin couldn't determine accurate dates.

## Solution

Build Quartz from within the blumeops repo instead:
1. Copy Quartz's build system (quartz/, package.json, etc.) into the workspace
2. Symlink `content` -> `docs` (preserves git history)
3. Symlink `docs/CHANGELOG.md` -> `../CHANGELOG.md`
4. Build from workspace root where git can trace file history
5. Clean up artifacts after creating tarball

## Deployment and Testing

- [ ] Run build workflow and verify no "not tracked by git" warnings
- [ ] Verify file dates appear correctly on built docs site

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/105
2026-02-04 08:25:46 -08:00
Forgejo Actions
1835e3e80e Update docs release to v1.3.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:14:08 +00:00
efdd569285 Improve build workflow with version bump selection and changelog in releases (#104) v1.3.0
## Summary

- Add `version_type` choice input with options: BUMP_PATCH (default), BUMP_MINOR, BUMP_MAJOR, SPECIFIC_VERSION
- Add optional `specific_version` input for explicit version selection
- Include changelog content in Forgejo release body under "What's Changed" section
- Move CHANGELOG.md to repository root (still copied into docs during Quartz build)
- Add CHANGELOG link to docs index page
- Update doc-links script to recognize build-time docs from repo root

## Changes

**Workflow inputs:**
- Previously: single optional `version` string input
- Now: `version_type` choice dropdown (defaults to BUMP_PATCH) + optional `specific_version` for explicit versions

**Release body:**
- Previously: just asset download instructions
- Now: includes "What's Changed" section with changelog entries for this release

**CHANGELOG.md location:**
- Previously: `docs/CHANGELOG.md`
- Now: `CHANGELOG.md` (repo root), copied into docs content during build

## Deployment and Testing

- [ ] Run build workflow with BUMP_PATCH (default)
- [ ] Run build workflow with BUMP_MINOR
- [ ] Verify changelog appears in release body
- [ ] Verify docs site includes CHANGELOG page

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/104
2026-02-04 08:13:16 -08:00
e720b524d3 Rename indri-services-check to services-check (#103)
## Summary
- Rename `indri-services-check` task to `services-check` since it checks all services (indri native, Kubernetes, HTTP endpoints), not just indri-specific ones
- Update references in CLAUDE.md, ai-assistance-guide.md, and troubleshooting.md

## Deployment and Testing
- [ ] Run `mise run services-check` to verify the task works under its new name

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/103
2026-02-04 07:49:15 -08:00
aabfcf6fc0 Document Forgejo Actions secrets (#102)
## Summary
- Add "Forgejo Actions Secrets" section to forgejo reference card
- Document that `ARGOCD_AUTH_TOKEN` is used by `build-blumeops.yaml` workflow
- Note that secrets are stored in 1Password but manually copied to Forgejo (no auto-sync)
- Add missing `build-blumeops.yaml` to workflows list
- Clarify distinction between server config secrets (1Password → Ansible) vs CI/CD secrets (Forgejo UI)

## Context
The forgejo-runner ArgoCD app was showing OutOfSync because a previous attempt stored `argocd_token` in the ExternalSecret. This was incorrect - the token is actually a Forgejo Actions secret, not a k8s secret. Synced the app to remove the drift and added documentation to prevent future confusion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/102
2026-02-04 07:32:32 -08:00
1e13d4b83d Fix Navidrome automatic library scan schedule (#101)
## Summary
- Fix env var name from `ND_SCANSCHEDULE` to `ND_SCANNER_SCHEDULE` (Navidrome uses viper config where dots become underscores)
- Use explicit `@every 1h` format for clarity
- Reorder CLAUDE.md rules to emphasize running zk-docs first

## Root Cause
Navidrome logs showed "Periodic scan is DISABLED" at startup despite the env var being set. The config key is `scanner.schedule`, which translates to `ND_SCANNER_SCHEDULE` (not `ND_SCANSCHEDULE`).

## Deployment and Testing
- [ ] Sync navidrome app: `argocd app sync navidrome`
- [ ] Verify pod restarts with new env var
- [ ] Check logs for "Scheduling scanner" message instead of "Periodic scan is DISABLED"
- [ ] Wait ~1 hour and confirm scan runs automatically

🤖 Generated with [Claude Code](https://claude.ai/code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/101
2026-02-04 07:23:12 -08:00
72f9f21d46 Remove iCloud Photos from borgmatic backup (#100)
## Summary
- Remove ~/Pictures from borgmatic source directories
- Update borgmatic and backup policy documentation
- Add Sifaka-Native Data section to clarify that photos (via Immich), music (via Navidrome), and video (via Jellyfin) are stored directly on Sifaka

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags borgmatic --check --diff` to preview changes
- [ ] Run `mise run provision-indri -- --tags borgmatic` to apply
- [ ] Verify borgmatic config no longer includes ~/Pictures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/100
2026-02-04 07:09:28 -08:00
12c92de9e1 Add troubleshooting how-to to zk-docs (#99)
## Summary
- Add the troubleshooting how-to guide to the zk-docs script output
- Placed after how-to/index.md to group how-to content together

## Deployment and Testing
- [x] Verified script runs successfully with `mise run zk-docs`
- [x] Changelog fragment added

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/99
2026-02-04 06:44:23 -08:00
Forgejo Actions
e405a48881 Update docs release to v1.2.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 05:18:37 +00:00