Remove -f flag from curl so 404 on /releases/latest doesn't fail the
script when there are no releases yet.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Move all existing zettelkasten cards from `docs/` to `docs/zk/` as a temporary holding area
- Update `zk-docs` mise task to look in the new location
- Add `docs/README.md` explaining the Diataxis-based restructuring plan and target audiences
## Context
This is phase 1 of a multi-phase documentation restructuring effort. The goal is to reorganize docs to follow the Diataxis framework while serving multiple audiences:
1. Erich (owner) - knowledge graph/zk
2. Claude/AI agents - memory and context enrichment
3. New external readers - high-level overview
4. Potential operators/contributors - onboarding
5. Replicators - people wanting to duplicate the approach
## Testing
- [x] Verified `mise run zk-docs` still works with the new path
- [x] Updated obsidian.nvim config (in ~/.config/nvim) to point to new path
## Note
The obsidian.nvim config change is outside this repo but was made as part of this work.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/84
## Summary
- Add "Unhealthy Pods" stat panel showing count of pods in error states (ImagePullBackOff, CrashLoopBackOff, etc.) with red background when > 0
- Add "Pods by Waiting Reason" time series chart showing container waiting states over time
- Provides visibility into stuck pods that ArgoCD doesn't track (since it manages CronJobs, not the Jobs/Pods they spawn)
## Context
This addresses the issue where a `zim-watcher` cronjob pod was stuck in `ImagePullBackOff` for 11 days without any alerting. ArgoCD showed the CronJob as "Synced, Healthy" because it only manages the CronJob resource, not its spawned Jobs/Pods.
## Deployment and Testing
- [ ] Sync grafana-config app to test branch
- [ ] Verify dashboard renders correctly
- [ ] Confirm "Unhealthy Pods" shows 0 (green) when no issues
- [ ] Reset to main after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/83
## Summary
- Move 21 blumeops-tagged zettelkasten cards from ~/code/personal/zk/ to docs/
- Create symlink ~/code/personal/zk/blumeops -> blumeops/docs for obsidian integration
- Update zk-docs mise task to read from local docs/ directory
- Add blumeops workspace to obsidian.nvim config (strict=true)
## Benefits
- Docs are now git-managed in the blumeops repo (visible on GitHub)
- Wiki links between blumeops docs continue to work via symlink
- obsidian-sync isolation: docs don't sync to work laptop
- Direct editing via obsidian.nvim with dedicated workspace
## Testing
- [x] Files moved to docs/ (21 files)
- [x] Symlink created: ~/code/personal/zk/blumeops -> blumeops/docs
- [x] zk-docs mise task updated and working
- [ ] Verify obsidian.nvim link resolution (after merge)
- [ ] Verify obsidian backlinks work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/82
Use Quick Launch settings for Kagi search with suggestions instead of
the search widget, which is the proper way to configure keyboard-driven
search in homepage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Deploy Navidrome music streaming server to k8s
- NFS mount for music library from sifaka:/volume1/music (read-only)
- Local PVC for SQLite database and config (10Gi)
- Tailscale ingress for dj.tail8d86e.ts.net
- Caddy reverse proxy for dj.ops.eblu.me
- Homepage annotations for dashboard discovery in Media group
## Deployment and Testing
- [ ] Sync `apps` application to pick up new Application definition
- [ ] Set navidrome app to feature branch and sync
- [ ] Verify NFS mount with `kubectl exec`
- [ ] Provision Caddy for dj.ops.eblu.me
- [ ] Access https://dj.ops.eblu.me and create initial admin user
- [ ] Verify Homepage shows DJ in Media group
- [ ] Reset to main and resync after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/79
## Summary
- Fix ArgoCD icon (use `argo-cd.png` per Dashboard Icons naming)
- Add Borgmatic backup metrics widget (time since last backup, archive size)
- Add Sifaka NAS disk usage widget (used/total space)
- Create `[[grafana]]` zk card with management notes
## What didn't work
Attempted Grafana iframe embedding for a metrics panel but reverted:
- Homepage iframe widget only supports height classes, not width
- Some panels fail to load even with anonymous auth enabled
- Documented in grafana zk card for future reference
## Deployment and Testing
- [x] ArgoCD icon displays correctly
- [x] Borgmatic metrics show time since backup and archive size
- [x] NAS disk usage shows used/total bytes
- [x] Grafana reverted to authenticated-only access
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/76
## Summary
- Remove hajimari (unmaintained since Oct 2022, broken helm deps)
- Add gethomepage (28k stars, actively maintained, monthly releases)
- Migrate custom apps, bookmarks, and search config
- Enable k8s RBAC for service autodiscovery
- Configure Tailscale ingress at go.tail8d86e.ts.net
## Why the switch
Hajimari hasn't released since October 2022. The helm chart has a broken
dependency (bjw-s/common URL is 404), and unreleased code on main has bugs.
gethomepage has similar k8s autodiscovery via ingress annotations and is
very actively maintained.
## Deployment and Testing
- [ ] Delete hajimari app from ArgoCD
- [ ] Delete hajimari namespace
- [ ] Sync apps to pick up new homepage app
- [ ] Sync homepage app
- [ ] Verify go.ops.eblu.me loads
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75
- Use chart from forge.ops.eblu.me/eblume/hajimari fork
- Use custom image from registry.ops.eblu.me/blumeops/hajimari
- Enables future customizations (search auto-focus, weather widget)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pre-install skopeo for pushing images to zot registry.
Docker 27's manifest format has compatibility issues with zot,
so we use skopeo for the push step.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Remove Tailscale sidecar from build-push-image action - registry.ops.eblu.me is directly reachable from k8s pods via Caddy
- Use skopeo for pushing images instead of docker push - Docker 27's manifest format has compatibility issues with zot registry
- Remove tailscale_authkey secret requirement from workflows
## Deployment and Testing
- [x] Tested with nettest-v0.10.0 tag - build succeeded and image pushed to registry
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/74
Avoids duplicate "Infrastructure" groups since Hajimari doesn't
merge customApps with discovered apps.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set Kagi as default search provider with simple-icons:kagi
- Add Google and DuckDuckGo as alternative search providers
- Explicitly enable showAppGroups, showAppUrls, showAppInfo
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fix immich app to track `main` branch instead of `feature/immich` for values
- The tailscale-operator ignoreDifferences schema drift will be fixed by syncing the `apps` app
## Deployment and Testing
- [ ] Sync `apps` to fix tailscale-operator schema drift
- [ ] Sync `immich` to pick up correct image versions from main
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/71
## Summary
- Update all metrics role defaults to install scripts to ~/.local/bin following XDG conventions
- Scripts already manually moved on indri from ~/bin to ~/.local/bin
- Cleaned up orphaned scripts (devpi-metrics, transmission-metrics, mcquack) and plist files
## Deployment and Testing
- [x] Manually moved scripts on indri
- [x] Deleted orphaned plist files (devpi-metrics, devpi, kiwix-serve, transmission-metrics)
- [x] Deleted orphaned scripts (devpi-metrics, transmission-metrics, mcquack)
- [x] Verified no metrics dependencies on orphaned scripts (checked alloy config and textfile directory)
- [ ] Run ansible to update LaunchAgent plist files with new paths
- [ ] Verify metrics collection continues working
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/70
## Summary
- Add fixed Y-axis (0-220M) so the 200M autovacuum threshold is always visible
- Add dashed threshold lines at 150M (yellow warning) and 200M (red danger)
- Update title to clarify the threshold
## Context
The raw XID age naturally trends upward between vacuum freezes, which looked alarming without context. Current values (~143K-216K) are at 0.1% of the threshold - completely healthy.
## Deployment and Testing
- [ ] Sync grafana-config app to feature branch
- [ ] Verify threshold lines appear on PostgreSQL dashboard
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/69
1Password Connect doesn't support ?ssh-format=openssh, so we need a
separate Secure Note item with the OpenSSH-formatted key.
Created new 1Password item: argocd-forge-ssh-key
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete 13 .yaml.tpl files replaced by ExternalSecrets
- Update immich/README.md with direct CNPG secret copy instructions
- Update miniflux/README.md with context flag and ESO note
Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Upgrades Immich image tag from v2.4.1 to v2.5.0
## Deployment and Testing
- [ ] Point immich ArgoCD app at feature branch and sync
- [ ] Verify pods come up healthy
- [ ] Verify Immich web UI accessible
- [ ] Reset to main and sync after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/64
## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks
## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
## Summary
- Update CLAUDE.md with new service routing documentation
- Document the two DNS domains: `*.ops.eblu.me` (Caddy) vs `*.tail8d86e.ts.net` (Tailscale)
- Fix incorrect service listings (Prometheus/Loki are in k8s, not indri)
## ZK Updates (not in this PR)
Also updated the blumeops zk card with:
- Source code URL (forge is primary, GitHub is mirror)
- Services split into Caddy vs Tailscale sections
- Updated port map for Caddy
- Updated "Adding a New Service" instructions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/57
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin
## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube
This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.
## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints
## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.
## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56