Commit graph

158 commits

Author SHA1 Message Date
9db4c9d9ae Replace homepage search widget with Quick Launch
Use Quick Launch settings for Kagi search with suggestions instead of
the search widget, which is the proper way to configure keyboard-driven
search in homepage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:08:35 -08:00
fc0ce955e4 Move DJ to Apps group on Homepage (#80)
## Summary
- Change navidrome homepage group from Media to Apps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/80
2026-01-31 20:29:43 -08:00
ade21cc49e Add Navidrome music streaming server (#79)
## Summary
- Deploy Navidrome music streaming server to k8s
- NFS mount for music library from sifaka:/volume1/music (read-only)
- Local PVC for SQLite database and config (10Gi)
- Tailscale ingress for dj.tail8d86e.ts.net
- Caddy reverse proxy for dj.ops.eblu.me
- Homepage annotations for dashboard discovery in Media group

## Deployment and Testing
- [ ] Sync `apps` application to pick up new Application definition
- [ ] Set navidrome app to feature branch and sync
- [ ] Verify NFS mount with `kubectl exec`
- [ ] Provision Caddy for dj.ops.eblu.me
- [ ] Access https://dj.ops.eblu.me and create initial admin user
- [ ] Verify Homepage shows DJ in Media group
- [ ] Reset to main and resync after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/79
2026-01-31 20:19:31 -08:00
b8b33b76c8 Remove Plex media server (#78)
## Summary
- Remove plex_metrics ansible role
- Remove Plex Grafana dashboard
- Remove Plex log collection from Alloy config
- Update indri-services-check to check Jellyfin instead of Plex

## Deployment and Testing
- [x] Unloaded plex-metrics LaunchAgent on indri
- [x] Deleted plex-metrics plist and script
- [x] Deleted plex.prom textfile
- [ ] Deploy Alloy config update
- [ ] Sync grafana-config to remove dashboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/78
2026-01-30 17:06:00 -08:00
bcc8685316 Add Jellyfin media server deployment (#77)
## Summary
- Add Jellyfin ansible role for native macOS deployment via Homebrew cask
- Add jellyfin_metrics role for Prometheus textfile metrics collection
- Add Caddy routing for jellyfin.ops.eblu.me
- Add Alloy log collection for Jellyfin stdout/stderr
- Add Grafana dashboard for Jellyfin monitoring

## Architecture
Jellyfin runs natively on indri (not in k8s) for full VideoToolbox hardware transcoding support. The M1 Mac Mini can handle ~3 concurrent 4K HDR→SDR transcoding streams.

## Deployment and Testing
- [ ] Deploy Jellyfin: `mise run provision-indri -- --tags jellyfin,jellyfin_metrics,caddy,alloy`
- [ ] Sync Grafana dashboard: `argocd app sync grafana-config`
- [ ] Complete Jellyfin setup wizard at https://jellyfin.ops.eblu.me
- [ ] Generate API key and save to `~/.jellyfin-api-key`
- [ ] Add media libraries (/Volumes/allisonflix/Movies, /Volumes/allisonflix/TV)
- [ ] Enable VideoToolbox hardware transcoding
- [ ] Verify metrics in Grafana dashboard
- [ ] Verify logs in Loki: `{service="jellyfin"}`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/77
2026-01-30 16:57:26 -08:00
23b8897c1f Add provider field to OpenWeatherMap widget
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:11:37 -08:00
e830a0f26b Homepage dashboard improvements (#76)
## Summary
- Fix ArgoCD icon (use `argo-cd.png` per Dashboard Icons naming)
- Add Borgmatic backup metrics widget (time since last backup, archive size)
- Add Sifaka NAS disk usage widget (used/total space)
- Create `[[grafana]]` zk card with management notes

## What didn't work
Attempted Grafana iframe embedding for a metrics panel but reverted:
- Homepage iframe widget only supports height classes, not width
- Some panels fail to load even with anonymous auth enabled
- Documented in grafana zk card for future reference

## Deployment and Testing
- [x] ArgoCD icon displays correctly
- [x] Borgmatic metrics show time since backup and archive size
- [x] NAS disk usage shows used/total bytes
- [x] Grafana reverted to authenticated-only access

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/76
2026-01-30 15:05:02 -08:00
38538ad5f0 Replace hajimari with gethomepage (#75)
## Summary
- Remove hajimari (unmaintained since Oct 2022, broken helm deps)
- Add gethomepage (28k stars, actively maintained, monthly releases)
- Migrate custom apps, bookmarks, and search config
- Enable k8s RBAC for service autodiscovery
- Configure Tailscale ingress at go.tail8d86e.ts.net

## Why the switch
Hajimari hasn't released since October 2022. The helm chart has a broken
dependency (bjw-s/common URL is 404), and unreleased code on main has bugs.
gethomepage has similar k8s autodiscovery via ingress annotations and is
very actively maintained.

## Deployment and Testing
- [ ] Delete hajimari app from ArgoCD
- [ ] Delete hajimari namespace
- [ ] Sync apps to pick up new homepage app
- [ ] Sync homepage app
- [ ] Verify go.ops.eblu.me loads

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75
2026-01-30 13:21:12 -08:00
9fd70c3892 Switch Hajimari to custom fork
- Use chart from forge.ops.eblu.me/eblume/hajimari fork
- Use custom image from registry.ops.eblu.me/blumeops/hajimari
- Enables future customizations (search auto-focus, weather widget)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:26:20 -08:00
4c852751db Update forgejo-runner to v2.2.0 (adds skopeo)
All checks were successful
Build Container / build (push) Successful in 13s
nettest-v0.11.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:13:54 -08:00
dc974858b0 Add skopeo to forgejo-runner image
All checks were successful
Build Container / build (push) Successful in 1m8s
forgejo-runner-v2.2.0
Pre-install skopeo for pushing images to zot registry.
Docker 27's manifest format has compatibility issues with zot,
so we use skopeo for the push step.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:11:19 -08:00
fd29244854 Simplify CI: remove Tailscale sidecar, use skopeo for push (#74)
## Summary
- Remove Tailscale sidecar from build-push-image action - registry.ops.eblu.me is directly reachable from k8s pods via Caddy
- Use skopeo for pushing images instead of docker push - Docker 27's manifest format has compatibility issues with zot registry
- Remove tailscale_authkey secret requirement from workflows

## Deployment and Testing
- [x] Tested with nettest-v0.10.0 tag - build succeeded and image pushed to registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/74
2026-01-30 10:18:20 -08:00
316a4c4e42 Shorten Hajimari info descriptions and hide URLs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:34:46 -08:00
1c63220dcd Rename bookmarks group from Docs to Admin
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:16:39 -08:00
a40e1dad1b Rename customApps group to "Host Services"
Avoids duplicate "Infrastructure" groups since Hajimari doesn't
merge customApps with discovered apps.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:14:47 -08:00
e9c5153a75 Disable Hajimari for Loki (no useful landing page)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:11:09 -08:00
303fb0ba05 Use simple-icons:immich for Immich dashboard icon
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:08:54 -08:00
08e5546c0b Configure Hajimari with Kagi search and app groups
- Set Kagi as default search provider with simple-icons:kagi
- Add Google and DuckDuckGo as alternative search providers
- Explicitly enable showAppGroups, showAppUrls, showAppInfo

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:04:55 -08:00
d1164c8aac Add Hajimari service dashboard (#73)
## Summary
- Add Hajimari as a service dashboard/start page at `go.ops.eblu.me`
- Auto-discovers k8s services from ingress annotations
- Custom apps for non-k8s services: Forgejo, Registry, Sifaka NAS
- Add `nas.ops.eblu.me` Caddy proxy to Synology dashboard

## Services Configured

**Auto-discovered (k8s ingresses with hajimari.io annotations):**
- Grafana, ArgoCD, Prometheus, Loki (Observability)
- Miniflux, Kiwix, Transmission, TeslaMate, Immich (Apps)
- PyPI/devpi (Infrastructure)

**Custom apps (non-k8s):**
- Forgejo (forge.ops.eblu.me)
- Registry (registry.ops.eblu.me)
- Sifaka NAS (nas.ops.eblu.me)

**Bookmarks:**
- Tailscale Admin, 1Password, Pulumi

## Deployment and Testing
- [ ] Sync `apps` application to pick up new Hajimari Application
- [ ] Sync `hajimari` application
- [ ] Run `mise run provision-indri -- --tags caddy` for go/nas proxy entries
- [ ] Re-sync all k8s apps with hajimari annotations (or wait for natural drift)
- [ ] Verify https://go.ops.eblu.me shows dashboard with all services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/73
2026-01-29 15:51:42 -08:00
3c3c90f206 Rename github readme to avoid display issues on github.com 2026-01-29 11:32:55 -08:00
dae438a5ed Update Immich to v2.5.2 (#72)
## Summary
- Update Immich from v2.5.0 to v2.5.2

## Deployment and Testing
- [ ] Sync apps application: `argocd app sync apps`
- [ ] Point immich at feature branch: `argocd app set immich --revision update-immich-2.5.2`
- [ ] Sync immich: `argocd app sync immich`
- [ ] Verify pods restart and photos.ops.eblu.me is accessible
- [ ] After merge, reset to main: `argocd app set immich --revision main && argocd app sync immich`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/72
2026-01-29 11:25:22 -08:00
0e2df9645d Fix ArgoCD sync drift for apps and immich (#71)
## Summary
- Fix immich app to track `main` branch instead of `feature/immich` for values
- The tailscale-operator ignoreDifferences schema drift will be fixed by syncing the `apps` app

## Deployment and Testing
- [ ] Sync `apps` to fix tailscale-operator schema drift
- [ ] Sync `immich` to pick up correct image versions from main

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/71
2026-01-29 10:24:26 -08:00
2bc826c31f Move metrics scripts from ~/bin to ~/.local/bin (#70)
## Summary
- Update all metrics role defaults to install scripts to ~/.local/bin following XDG conventions
- Scripts already manually moved on indri from ~/bin to ~/.local/bin
- Cleaned up orphaned scripts (devpi-metrics, transmission-metrics, mcquack) and plist files

## Deployment and Testing
- [x] Manually moved scripts on indri
- [x] Deleted orphaned plist files (devpi-metrics, devpi, kiwix-serve, transmission-metrics)
- [x] Deleted orphaned scripts (devpi-metrics, transmission-metrics, mcquack)
- [x] Verified no metrics dependencies on orphaned scripts (checked alloy config and textfile directory)
- [ ] Run ansible to update LaunchAgent plist files with new paths
- [ ] Verify metrics collection continues working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/70
2026-01-29 09:59:38 -08:00
0d8eb651d4 Fix XID Age graph to show threshold context (#69)
All checks were successful
Build Container / build (push) Successful in 1m10s
devpi-v1.0.1
## Summary
- Add fixed Y-axis (0-220M) so the 200M autovacuum threshold is always visible
- Add dashed threshold lines at 150M (yellow warning) and 200M (red danger)
- Update title to clarify the threshold

## Context
The raw XID age naturally trends upward between vacuum freezes, which looked alarming without context. Current values (~143K-216K) are at 0.1% of the threshold - completely healthy.

## Deployment and Testing
- [ ] Sync grafana-config app to feature branch
- [ ] Verify threshold lines appear on PostgreSQL dashboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/69
2026-01-29 07:08:21 -08:00
0604877db2 Add 'Tesla' prefix to all TeslaMate dashboard titles (#68)
## Summary
- Renamed all 18 TeslaMate Grafana dashboards to include "Tesla" prefix
- Improves organization and discoverability in the dashboard list

## Deployment and Testing
- [ ] Sync grafana-config app: `argocd app set grafana-config --revision feature/rename-tesla-dashboards && argocd app sync grafana-config`
- [ ] Verify dashboards display with "Tesla" prefix in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/68
2026-01-29 06:55:44 -08:00
46081f5f10 Update rule 10: also require permission to push to main
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:45:44 -08:00
55522579dc Add rule: never merge PRs without explicit user request
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:43:51 -08:00
a93f2a77e1 Merge pull request 'Migrate remaining secrets to ExternalSecrets' (#67) from feature/migrate-remaining-secrets into main 2026-01-28 20:41:45 -08:00
8f4660915d Fix argocd SSH key format for 1Password Connect
1Password Connect doesn't support ?ssh-format=openssh, so we need a
separate Secure Note item with the OpenSSH-formatted key.

Created new 1Password item: argocd-forge-ssh-key

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:31:16 -08:00
9114aac8f6 Switch all ExternalSecrets to creationPolicy: Owner
ESO now has full ownership of these secrets.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:27:16 -08:00
dd6cf20d51 Remove obsolete secret templates
- Delete 13 .yaml.tpl files replaced by ExternalSecrets
- Update immich/README.md with direct CNPG secret copy instructions
- Update miniflux/README.md with context flag and ESO note

Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:26:37 -08:00
351528474c Add ExternalSecrets for remaining k8s secrets
Migrate 10 secret templates to ESO ExternalSecrets with 1Password Connect:
- databases: eblume, borgmatic, teslamate passwords
- tailscale-operator: OAuth client credentials
- grafana-config: admin password, teslamate datasource
- teslamate: db password, encryption key
- forgejo-runner: runner registration token
- argocd: forge SSH credentials

All use creationPolicy: Merge for safe migration from existing secrets.

Skipped:
- miniflux/secret-db: Uses CNPG secret, not 1Password directly
- immich/secret-db: Requires 1Password item creation first
- 1password-connect: Bootstrap secret, must stay as template

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 19:50:38 -08:00
482414346e Add External Secrets Operator with 1Password Connect (#66) (#66)
## Summary
- Add 1Password Connect server for secrets automation API
- Add External Secrets Operator (ESO) to sync secrets from 1Password to K8s
- Add ClusterSecretStore connecting ESO to 1Password Connect
- Convert devpi secret to ExternalSecret as proof of concept

## Architecture
```
1Password Cloud → 1Password Connect (k8s) → ESO → Native K8s Secrets
```

## Deployment and Testing
- [ ] Mirror Helm charts to forge (connect-helm-charts, external-secrets) - DONE
- [ ] Create 1Password Connect credentials (`op connect server create`)
- [ ] Store credentials in 1Password item "1Password Connect"
- [ ] Bootstrap secret: `op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | kubectl apply -f -`
- [ ] Deploy 1password-connect: `argocd app sync 1password-connect`
- [ ] Deploy external-secrets: `argocd app sync external-secrets`
- [ ] Deploy external-secrets-config: `argocd app sync external-secrets-config`
- [ ] Test devpi ExternalSecret: `argocd app sync devpi`
- [ ] Verify secret synced: `kubectl get externalsecret -n devpi`

## Future Work
After PoC validated, migrate remaining 12 secret templates to ExternalSecrets:
- databases (3), tailscale-operator (1), grafana-config (2), teslamate (2)
- forgejo-runner (1), argocd (1), immich (1), 1password-connect (1 - self-bootstrap)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/66
2026-01-28 19:30:10 -08:00
3971670832 Remove immich-sync ansible role (#65)
## Summary
- Remove immich_sync ansible role (server-side photo sync via osxphotos)
- The Immich iOS app has built-in automatic backup that replaces this functionality
- iOS app supports foreground/background backup and can sync iCloud photos directly

## Deployment and Testing
- [ ] Clean up files on indri (see manual cleanup commands below)
- [ ] Configure Immich iOS app for automatic backup

### Manual cleanup on indri:
```bash
# Unload and remove LaunchAgent
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist
rm ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist

# Remove script and credentials
rm ~/bin/immich-sync.sh
rm ~/.immich-api-key

# Remove logs
rm ~/Library/Logs/mcquack.immich-sync.*.log

# Optionally remove export directory (check if empty first)
ls ~/Pictures/immich-export
# rm -r ~/Pictures/immich-export
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/65
2026-01-28 08:49:22 -08:00
aa4464f84e Upgrade Immich from v2.4.1 to v2.5.0 (#64)
## Summary
- Upgrades Immich image tag from v2.4.1 to v2.5.0

## Deployment and Testing
- [ ] Point immich ArgoCD app at feature branch and sync
- [ ] Verify pods come up healthy
- [ ] Verify Immich web UI accessible
- [ ] Reset to main and sync after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/64
2026-01-27 20:51:09 -08:00
54945be0e3 Add immich-sync ansible role for photo library sync (#63)
## Summary
- Add `immich_sync` role that syncs macOS Photos Library to Immich
- Uses osxphotos to export photos with metadata to staging directory
- Uses immich-cli (via Docker) to upload to Immich server
- LaunchAgent schedules hourly syncs following mcquack pattern
- API key fetched from 1Password in playbook pre_tasks

## Architecture
```
Photos Library → osxphotos export → ~/Pictures/immich-export/ → immich-cli upload → Immich
```

## Prerequisites (manual)
- Install osxphotos on indri: Add `"pipx:osxphotos" = "latest"` to `~/.config/mise/config.toml`, run `mise install`
- Docker is already installed on indri

## Deployment and Testing
- [ ] Dry run: `mise run provision-indri -- --tags immich_sync --check --diff`
- [ ] Deploy: `mise run provision-indri -- --tags immich_sync`
- [ ] Verify LaunchAgent: `ssh indri 'launchctl list | grep immich'`
- [ ] Test manual sync: `ssh indri '~/bin/immich-sync.sh'`
- [ ] Check logs: `ssh indri 'tail -50 ~/Library/Logs/mcquack.immich-sync.out.log'`
- [ ] Verify photos in Immich at https://photos.ops.eblu.me

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/63
2026-01-26 12:38:38 -08:00
8621996343 Add Immich photo management + migrate forge URLs (#62)
## Summary
- Migrate all ArgoCD app repo URLs from `indri.tail8d86e.ts.net:2200` to `forge.ops.eblu.me:2222`
- Add Immich self-hosted photo management service with:
  - Helm chart deployment via ArgoCD
  - PostgreSQL cluster with pgvecto.rs for AI vector search (immich-pg)
  - NFS storage on sifaka for photo library (2Ti)
  - Tailscale Ingress + Caddy proxy for `photos.ops.eblu.me`
  - Machine learning service for face/object recognition

## Deployment and Testing
- [x] Update ArgoCD repo-creds-forge secret with new URL (one-time manual step)
- [ ] Sync `apps` to pick up new applications
- [ ] Sync all existing apps to verify new forge URL works
- [ ] Sync `blumeops-pg` to deploy immich-pg cluster
- [ ] Wait for immich-pg to be healthy
- [ ] Create immich-db secret from auto-generated password
- [ ] Sync `immich-storage` (PV, PVC, Ingress)
- [ ] Sync `immich` (Helm chart)
- [ ] Run `mise run provision-indri -- --tags caddy` to add photos.ops.eblu.me
- [ ] Verify Immich UI is accessible

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/62
2026-01-26 11:20:11 -08:00
c8b655f177 Build local containers for k8s services (#61)
## Summary
- Move devpi Dockerfile from argocd/manifests to containers/devpi/
- Add containers for: transmission, teslamate, miniflux, kiwix-serve, kubectl
- Update all k8s deployments to use local images (registry.ops.eblu.me/blumeops/*)
- All containers use v1.0.0 tag for initial release

## Containers Added
| Container | Source | Notes |
|-----------|--------|-------|
| devpi | python:3.12-slim | Existing, moved to containers/ |
| kubectl | alpine + download | For zim-watcher CronJob |
| miniflux | Go build from source | v2.2.16 |
| kiwix-serve | Download pre-built binary | v3.8.1 |
| transmission | alpine + apk install | Simpler than linuxserver image |
| teslamate | Elixir build from source | v2.2.0 |

## Deployment and Testing
- [ ] Build and tag devpi-v1.0.0
- [ ] Build and tag kubectl-v1.0.0
- [ ] Build and tag miniflux-v1.0.0
- [ ] Build and tag kiwix-serve-v1.0.0
- [ ] Build and tag transmission-v1.0.0
- [ ] Build and tag teslamate-v1.0.0
- [ ] Sync ArgoCD apps and verify services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/61
2026-01-25 21:35:57 -08:00
ea42362b6f Migrate Forgejo runner to Kubernetes with DinD (#60)
## Summary
- Deploy Forgejo runner to k8s with Docker-in-Docker sidecar
- Add job execution image with Node.js and Docker CLI
- Retire host-mode runner on indri
- All CI jobs now run containerized in k8s

## Components Added
- `containers/forgejo-runner/Dockerfile` - Job execution image
- `argocd/apps/forgejo-runner.yaml` - ArgoCD Application
- `argocd/manifests/forgejo-runner/` - Kubernetes manifests

## Components Removed
- `ansible/roles/forgejo_runner/` - No longer needed

## Changes to Existing Files
- `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env
- `.github/actionlint.yaml` - Only `k8s` label now valid

## Deployment
1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -`
2. Sync ArgoCD: `argocd app sync forgejo-runner`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60
2026-01-25 19:56:17 -08:00
66badfafd1 Migrate k8s services to Caddy (*.ops.eblu.me) (#59)
All checks were successful
Build Container / build (push) Successful in 13s
nettest-v0.8.0
## Summary
- Add Caddy reverse proxy routes for all k8s services (grafana, argocd, prometheus, loki, miniflux, devpi, kiwix, torrent, teslamate)
- Add PostgreSQL via Caddy L4 TCP proxy on port 5432
- Caddy proxies to existing Tailscale endpoints - traffic stays local on indri
- Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs continue to work

## Updated References
- Alloy: prometheus/loki push endpoints → `*.ops.eblu.me`
- Borgmatic: PostgreSQL backup host → `pg.ops.eblu.me`
- Devpi: DEVPI_OUTSIDE_URL → `pypi.ops.eblu.me`
- indri-services-check: health check URLs
- CLAUDE.md: argocd login command

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy new Caddy config
- [ ] Test HTTP services: `curl https://grafana.ops.eblu.me/api/health`
- [ ] Test PostgreSQL: `pg_isready -h pg.ops.eblu.me -p 5432`
- [ ] Run `mise run provision-indri -- --tags alloy` to update Alloy endpoints
- [ ] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic
- [ ] Sync devpi in ArgoCD: `argocd app sync devpi`
- [ ] Re-login to ArgoCD: `argocd login argocd.ops.eblu.me ...`
- [ ] Run `mise run indri-services-check` to verify all services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/59
2026-01-25 12:56:31 -08:00
d6e6b48f6a Migrate registry to Caddy (registry.ops.eblu.me) (#58)
## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
2026-01-25 12:06:15 -08:00
9c1b7c7ca1 Update docs for Caddy migration (#57)
## Summary
- Update CLAUDE.md with new service routing documentation
- Document the two DNS domains: `*.ops.eblu.me` (Caddy) vs `*.tail8d86e.ts.net` (Tailscale)
- Fix incorrect service listings (Prometheus/Loki are in k8s, not indri)

## ZK Updates (not in this PR)
Also updated the blumeops zk card with:
- Source code URL (forge is primary, GitHub is mirror)
- Services split into Caddy vs Tailscale sections
- Updated port map for Caddy
- Updated "Adding a New Service" instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/57
2026-01-25 11:52:35 -08:00
1184b4de1d Add Caddy layer4 for Forgejo SSH (#56)
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin

## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube

This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.

## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints

## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.

## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56
2026-01-25 11:37:23 -08:00
682a68dc9c Add Caddy reverse proxy for blumeops services (#55)
## Summary
- Add Caddy ansible role following zot pattern (manual build, ansible deploy)
- Caddy built with Gandi DNS plugin for ACME DNS-01 challenges
- Gandi PAT fetched from 1Password and written to secured file on indri
- Configure wildcard TLS for `*.ops.eblu.me`
- Initial services: forge, registry (indri-local)
- Uses port 8443 during testing to avoid Tailscale serve conflicts

## Build Instructions (already done)
On indri:
```bash
cd ~/code/3rd/caddy && mise run build
```

## Deployment and Testing
- [ ] Review Caddyfile configuration
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy
- [ ] Test: `curl -v https://forge.ops.eblu.me:8443` (should get TLS cert)
- [ ] Test: `curl -v https://registry.ops.eblu.me:8443/v2/` (should return `{}`)
- [ ] Once verified, switch to port 443 and migrate services from Tailscale serve

## Files Changed
- `ansible/playbooks/indri.yml` - Add pre_task for Gandi PAT, add caddy role
- `ansible/roles/caddy/` - New role with Caddyfile and LaunchAgent templates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/55
2026-01-25 09:35:06 -08:00
b08faa50cc Add Gandi DNS management via Pulumi (#54)
## Summary
- Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/`
- Add Gandi LiveDNS management for `eblu.me` domain
- Create wildcard DNS record `*.ops.eblu.me` → indri's Tailscale IP (100.98.163.89)
- Add mise tasks: `dns-up`, `dns-preview`
- Update `tailnet-up` to pass `--yes` by default
- Document PAT cycling process (expires every 30 days)

## Background
This enables using real DNS names (`*.ops.eblu.me`) that resolve to Tailscale IPs,
which allows containers and other systems to resolve services without depending on
MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain
tailnet-only while using standard DNS.

## Deployment and Testing
- [ ] Run `cd pulumi/gandi && uv sync` to install dependencies
- [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack
- [ ] Run `mise run dns-preview` to verify configuration
- [ ] Run `mise run dns-up` to apply DNS records
- [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54
2026-01-25 08:15:46 -08:00
af3536bc17 Simplify indri IP extraction from tailscale status
Some checks failed
Build Container / build (push) Failing after 13s
nettest-v0.7.0
Use simple grep and awk to parse plain text tailscale status output
instead of trying to parse JSON. Also show the status output for
debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 20:04:28 -08:00
04fdd5906d Get indri's IP from tailscale status for registry access
Some checks failed
Build Container / build (push) Failing after 9s
nettest-v0.6.0
Use 'tailscale status' to get indri's Tailscale IP and add it to
/etc/hosts for registry hostname resolution. The registry service
runs on indri, so we need indri's IP specifically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 20:00:57 -08:00
16ccabbc34 Resolve registry IP via Tailscale and add to /etc/hosts
Some checks failed
Build Container / build (push) Failing after 8s
nettest-v0.5.0
The Tailscale container's DNS doesn't work because it runs in userspace
mode. Instead, resolve the registry IP using 'tailscale ip' and add it
to /etc/hosts inside the container before running skopeo.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:44:39 -08:00
b4b8abb2d9 Add tag:ci-gateway to Tailscale ACL for CI builds
Some checks failed
Build Container / build (push) Failing after 13s
nettest-v0.4.0
- Add tag:ci-gateway to tagOwners
- Grant ci-gateway access to registry on port 443
- Add test for ci-gateway -> registry access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:38:02 -08:00
9f98b3007e Add debugging for Tailscale container failures
Some checks failed
Build Container / build (push) Failing after 7s
nettest-v0.3.0
Capture container logs when the Tailscale sidecar exits unexpectedly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:32:40 -08:00