Commit graph

154 commits

Author SHA1 Message Date
caa626a4da Fix Homepage layout and add icon colors
Layout:
- Remove row style from Admin bookmarks (now vertical)
- Add Apps, Observability, Infrastructure groups to layout

Icons:
- Add brand colors to si- icons so they're not greyed out
- Grafana #F46800, Prometheus #E6522C, Tesla #CC0000
- ArgoCD #EF7B4D, Immich #4250AF, PyPI #3775A9

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 13:00:59 -08:00
75099d29c5 Add pod-selector annotations for Homepage status checks
Homepage was looking for pods with labels matching the ingress name
(e.g., app.kubernetes.io/name=grafana-tailscale) but actual pods have
different labels (e.g., app.kubernetes.io/name=grafana).

Added gethomepage.dev/pod-selector to each ingress to specify the
correct label selector for pod status checks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 12:53:25 -08:00
8e1484d546 Migrate ingress annotations from hajimari to gethomepage
Convert all hajimari.io/* annotations to gethomepage.dev/* format:
- hajimari.io/enable -> gethomepage.dev/enabled
- hajimari.io/appName -> gethomepage.dev/name
- hajimari.io/group -> gethomepage.dev/group
- hajimari.io/icon -> gethomepage.dev/icon (si- or mdi- prefix)
- hajimari.io/info -> gethomepage.dev/description
- hajimari.io/url -> gethomepage.dev/href

Updated services: grafana, loki, prometheus, teslamate, miniflux,
kiwix, argocd, immich, devpi, torrent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 12:20:29 -08:00
4d544f6941 Replace hajimari with gethomepage
Hajimari has been unmaintained since 2022 with broken helm chart
dependencies. Switching to gethomepage which is actively maintained
(28k stars, monthly releases) and has similar k8s autodiscovery.

- Remove hajimari ArgoCD app and manifests
- Add homepage ArgoCD app with jameswynn helm chart
- Migrate custom apps, bookmarks, and search config
- Enable RBAC for k8s service autodiscovery
- Configure Tailscale ingress at go.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 12:05:44 -08:00
9fd70c3892 Switch Hajimari to custom fork
- Use chart from forge.ops.eblu.me/eblume/hajimari fork
- Use custom image from registry.ops.eblu.me/blumeops/hajimari
- Enables future customizations (search auto-focus, weather widget)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:26:20 -08:00
4c852751db Update forgejo-runner to v2.2.0 (adds skopeo)
All checks were successful
Build Container / build (push) Successful in 13s
nettest-v0.11.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:13:54 -08:00
dc974858b0 Add skopeo to forgejo-runner image
All checks were successful
Build Container / build (push) Successful in 1m8s
forgejo-runner-v2.2.0
Pre-install skopeo for pushing images to zot registry.
Docker 27's manifest format has compatibility issues with zot,
so we use skopeo for the push step.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 11:11:19 -08:00
fd29244854 Simplify CI: remove Tailscale sidecar, use skopeo for push (#74)
## Summary
- Remove Tailscale sidecar from build-push-image action - registry.ops.eblu.me is directly reachable from k8s pods via Caddy
- Use skopeo for pushing images instead of docker push - Docker 27's manifest format has compatibility issues with zot registry
- Remove tailscale_authkey secret requirement from workflows

## Deployment and Testing
- [x] Tested with nettest-v0.10.0 tag - build succeeded and image pushed to registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/74
2026-01-30 10:18:20 -08:00
316a4c4e42 Shorten Hajimari info descriptions and hide URLs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:34:46 -08:00
1c63220dcd Rename bookmarks group from Docs to Admin
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:16:39 -08:00
a40e1dad1b Rename customApps group to "Host Services"
Avoids duplicate "Infrastructure" groups since Hajimari doesn't
merge customApps with discovered apps.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:14:47 -08:00
e9c5153a75 Disable Hajimari for Loki (no useful landing page)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:11:09 -08:00
303fb0ba05 Use simple-icons:immich for Immich dashboard icon
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:08:54 -08:00
08e5546c0b Configure Hajimari with Kagi search and app groups
- Set Kagi as default search provider with simple-icons:kagi
- Add Google and DuckDuckGo as alternative search providers
- Explicitly enable showAppGroups, showAppUrls, showAppInfo

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 16:04:55 -08:00
d1164c8aac Add Hajimari service dashboard (#73)
## Summary
- Add Hajimari as a service dashboard/start page at `go.ops.eblu.me`
- Auto-discovers k8s services from ingress annotations
- Custom apps for non-k8s services: Forgejo, Registry, Sifaka NAS
- Add `nas.ops.eblu.me` Caddy proxy to Synology dashboard

## Services Configured

**Auto-discovered (k8s ingresses with hajimari.io annotations):**
- Grafana, ArgoCD, Prometheus, Loki (Observability)
- Miniflux, Kiwix, Transmission, TeslaMate, Immich (Apps)
- PyPI/devpi (Infrastructure)

**Custom apps (non-k8s):**
- Forgejo (forge.ops.eblu.me)
- Registry (registry.ops.eblu.me)
- Sifaka NAS (nas.ops.eblu.me)

**Bookmarks:**
- Tailscale Admin, 1Password, Pulumi

## Deployment and Testing
- [ ] Sync `apps` application to pick up new Hajimari Application
- [ ] Sync `hajimari` application
- [ ] Run `mise run provision-indri -- --tags caddy` for go/nas proxy entries
- [ ] Re-sync all k8s apps with hajimari annotations (or wait for natural drift)
- [ ] Verify https://go.ops.eblu.me shows dashboard with all services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/73
2026-01-29 15:51:42 -08:00
3c3c90f206 Rename github readme to avoid display issues on github.com 2026-01-29 11:32:55 -08:00
dae438a5ed Update Immich to v2.5.2 (#72)
## Summary
- Update Immich from v2.5.0 to v2.5.2

## Deployment and Testing
- [ ] Sync apps application: `argocd app sync apps`
- [ ] Point immich at feature branch: `argocd app set immich --revision update-immich-2.5.2`
- [ ] Sync immich: `argocd app sync immich`
- [ ] Verify pods restart and photos.ops.eblu.me is accessible
- [ ] After merge, reset to main: `argocd app set immich --revision main && argocd app sync immich`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/72
2026-01-29 11:25:22 -08:00
0e2df9645d Fix ArgoCD sync drift for apps and immich (#71)
## Summary
- Fix immich app to track `main` branch instead of `feature/immich` for values
- The tailscale-operator ignoreDifferences schema drift will be fixed by syncing the `apps` app

## Deployment and Testing
- [ ] Sync `apps` to fix tailscale-operator schema drift
- [ ] Sync `immich` to pick up correct image versions from main

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/71
2026-01-29 10:24:26 -08:00
2bc826c31f Move metrics scripts from ~/bin to ~/.local/bin (#70)
## Summary
- Update all metrics role defaults to install scripts to ~/.local/bin following XDG conventions
- Scripts already manually moved on indri from ~/bin to ~/.local/bin
- Cleaned up orphaned scripts (devpi-metrics, transmission-metrics, mcquack) and plist files

## Deployment and Testing
- [x] Manually moved scripts on indri
- [x] Deleted orphaned plist files (devpi-metrics, devpi, kiwix-serve, transmission-metrics)
- [x] Deleted orphaned scripts (devpi-metrics, transmission-metrics, mcquack)
- [x] Verified no metrics dependencies on orphaned scripts (checked alloy config and textfile directory)
- [ ] Run ansible to update LaunchAgent plist files with new paths
- [ ] Verify metrics collection continues working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/70
2026-01-29 09:59:38 -08:00
0d8eb651d4 Fix XID Age graph to show threshold context (#69)
All checks were successful
Build Container / build (push) Successful in 1m10s
devpi-v1.0.1
## Summary
- Add fixed Y-axis (0-220M) so the 200M autovacuum threshold is always visible
- Add dashed threshold lines at 150M (yellow warning) and 200M (red danger)
- Update title to clarify the threshold

## Context
The raw XID age naturally trends upward between vacuum freezes, which looked alarming without context. Current values (~143K-216K) are at 0.1% of the threshold - completely healthy.

## Deployment and Testing
- [ ] Sync grafana-config app to feature branch
- [ ] Verify threshold lines appear on PostgreSQL dashboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/69
2026-01-29 07:08:21 -08:00
0604877db2 Add 'Tesla' prefix to all TeslaMate dashboard titles (#68)
## Summary
- Renamed all 18 TeslaMate Grafana dashboards to include "Tesla" prefix
- Improves organization and discoverability in the dashboard list

## Deployment and Testing
- [ ] Sync grafana-config app: `argocd app set grafana-config --revision feature/rename-tesla-dashboards && argocd app sync grafana-config`
- [ ] Verify dashboards display with "Tesla" prefix in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/68
2026-01-29 06:55:44 -08:00
46081f5f10 Update rule 10: also require permission to push to main
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:45:44 -08:00
55522579dc Add rule: never merge PRs without explicit user request
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:43:51 -08:00
a93f2a77e1 Merge pull request 'Migrate remaining secrets to ExternalSecrets' (#67) from feature/migrate-remaining-secrets into main 2026-01-28 20:41:45 -08:00
8f4660915d Fix argocd SSH key format for 1Password Connect
1Password Connect doesn't support ?ssh-format=openssh, so we need a
separate Secure Note item with the OpenSSH-formatted key.

Created new 1Password item: argocd-forge-ssh-key

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:31:16 -08:00
9114aac8f6 Switch all ExternalSecrets to creationPolicy: Owner
ESO now has full ownership of these secrets.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:27:16 -08:00
dd6cf20d51 Remove obsolete secret templates
- Delete 13 .yaml.tpl files replaced by ExternalSecrets
- Update immich/README.md with direct CNPG secret copy instructions
- Update miniflux/README.md with context flag and ESO note

Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:26:37 -08:00
351528474c Add ExternalSecrets for remaining k8s secrets
Migrate 10 secret templates to ESO ExternalSecrets with 1Password Connect:
- databases: eblume, borgmatic, teslamate passwords
- tailscale-operator: OAuth client credentials
- grafana-config: admin password, teslamate datasource
- teslamate: db password, encryption key
- forgejo-runner: runner registration token
- argocd: forge SSH credentials

All use creationPolicy: Merge for safe migration from existing secrets.

Skipped:
- miniflux/secret-db: Uses CNPG secret, not 1Password directly
- immich/secret-db: Requires 1Password item creation first
- 1password-connect: Bootstrap secret, must stay as template

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 19:50:38 -08:00
482414346e Add External Secrets Operator with 1Password Connect (#66) (#66)
## Summary
- Add 1Password Connect server for secrets automation API
- Add External Secrets Operator (ESO) to sync secrets from 1Password to K8s
- Add ClusterSecretStore connecting ESO to 1Password Connect
- Convert devpi secret to ExternalSecret as proof of concept

## Architecture
```
1Password Cloud → 1Password Connect (k8s) → ESO → Native K8s Secrets
```

## Deployment and Testing
- [ ] Mirror Helm charts to forge (connect-helm-charts, external-secrets) - DONE
- [ ] Create 1Password Connect credentials (`op connect server create`)
- [ ] Store credentials in 1Password item "1Password Connect"
- [ ] Bootstrap secret: `op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | kubectl apply -f -`
- [ ] Deploy 1password-connect: `argocd app sync 1password-connect`
- [ ] Deploy external-secrets: `argocd app sync external-secrets`
- [ ] Deploy external-secrets-config: `argocd app sync external-secrets-config`
- [ ] Test devpi ExternalSecret: `argocd app sync devpi`
- [ ] Verify secret synced: `kubectl get externalsecret -n devpi`

## Future Work
After PoC validated, migrate remaining 12 secret templates to ExternalSecrets:
- databases (3), tailscale-operator (1), grafana-config (2), teslamate (2)
- forgejo-runner (1), argocd (1), immich (1), 1password-connect (1 - self-bootstrap)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/66
2026-01-28 19:30:10 -08:00
3971670832 Remove immich-sync ansible role (#65)
## Summary
- Remove immich_sync ansible role (server-side photo sync via osxphotos)
- The Immich iOS app has built-in automatic backup that replaces this functionality
- iOS app supports foreground/background backup and can sync iCloud photos directly

## Deployment and Testing
- [ ] Clean up files on indri (see manual cleanup commands below)
- [ ] Configure Immich iOS app for automatic backup

### Manual cleanup on indri:
```bash
# Unload and remove LaunchAgent
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist
rm ~/Library/LaunchAgents/mcquack.eblume.immich-sync.plist

# Remove script and credentials
rm ~/bin/immich-sync.sh
rm ~/.immich-api-key

# Remove logs
rm ~/Library/Logs/mcquack.immich-sync.*.log

# Optionally remove export directory (check if empty first)
ls ~/Pictures/immich-export
# rm -r ~/Pictures/immich-export
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/65
2026-01-28 08:49:22 -08:00
aa4464f84e Upgrade Immich from v2.4.1 to v2.5.0 (#64)
## Summary
- Upgrades Immich image tag from v2.4.1 to v2.5.0

## Deployment and Testing
- [ ] Point immich ArgoCD app at feature branch and sync
- [ ] Verify pods come up healthy
- [ ] Verify Immich web UI accessible
- [ ] Reset to main and sync after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/64
2026-01-27 20:51:09 -08:00
54945be0e3 Add immich-sync ansible role for photo library sync (#63)
## Summary
- Add `immich_sync` role that syncs macOS Photos Library to Immich
- Uses osxphotos to export photos with metadata to staging directory
- Uses immich-cli (via Docker) to upload to Immich server
- LaunchAgent schedules hourly syncs following mcquack pattern
- API key fetched from 1Password in playbook pre_tasks

## Architecture
```
Photos Library → osxphotos export → ~/Pictures/immich-export/ → immich-cli upload → Immich
```

## Prerequisites (manual)
- Install osxphotos on indri: Add `"pipx:osxphotos" = "latest"` to `~/.config/mise/config.toml`, run `mise install`
- Docker is already installed on indri

## Deployment and Testing
- [ ] Dry run: `mise run provision-indri -- --tags immich_sync --check --diff`
- [ ] Deploy: `mise run provision-indri -- --tags immich_sync`
- [ ] Verify LaunchAgent: `ssh indri 'launchctl list | grep immich'`
- [ ] Test manual sync: `ssh indri '~/bin/immich-sync.sh'`
- [ ] Check logs: `ssh indri 'tail -50 ~/Library/Logs/mcquack.immich-sync.out.log'`
- [ ] Verify photos in Immich at https://photos.ops.eblu.me

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/63
2026-01-26 12:38:38 -08:00
8621996343 Add Immich photo management + migrate forge URLs (#62)
## Summary
- Migrate all ArgoCD app repo URLs from `indri.tail8d86e.ts.net:2200` to `forge.ops.eblu.me:2222`
- Add Immich self-hosted photo management service with:
  - Helm chart deployment via ArgoCD
  - PostgreSQL cluster with pgvecto.rs for AI vector search (immich-pg)
  - NFS storage on sifaka for photo library (2Ti)
  - Tailscale Ingress + Caddy proxy for `photos.ops.eblu.me`
  - Machine learning service for face/object recognition

## Deployment and Testing
- [x] Update ArgoCD repo-creds-forge secret with new URL (one-time manual step)
- [ ] Sync `apps` to pick up new applications
- [ ] Sync all existing apps to verify new forge URL works
- [ ] Sync `blumeops-pg` to deploy immich-pg cluster
- [ ] Wait for immich-pg to be healthy
- [ ] Create immich-db secret from auto-generated password
- [ ] Sync `immich-storage` (PV, PVC, Ingress)
- [ ] Sync `immich` (Helm chart)
- [ ] Run `mise run provision-indri -- --tags caddy` to add photos.ops.eblu.me
- [ ] Verify Immich UI is accessible

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/62
2026-01-26 11:20:11 -08:00
c8b655f177 Build local containers for k8s services (#61)
## Summary
- Move devpi Dockerfile from argocd/manifests to containers/devpi/
- Add containers for: transmission, teslamate, miniflux, kiwix-serve, kubectl
- Update all k8s deployments to use local images (registry.ops.eblu.me/blumeops/*)
- All containers use v1.0.0 tag for initial release

## Containers Added
| Container | Source | Notes |
|-----------|--------|-------|
| devpi | python:3.12-slim | Existing, moved to containers/ |
| kubectl | alpine + download | For zim-watcher CronJob |
| miniflux | Go build from source | v2.2.16 |
| kiwix-serve | Download pre-built binary | v3.8.1 |
| transmission | alpine + apk install | Simpler than linuxserver image |
| teslamate | Elixir build from source | v2.2.0 |

## Deployment and Testing
- [ ] Build and tag devpi-v1.0.0
- [ ] Build and tag kubectl-v1.0.0
- [ ] Build and tag miniflux-v1.0.0
- [ ] Build and tag kiwix-serve-v1.0.0
- [ ] Build and tag transmission-v1.0.0
- [ ] Build and tag teslamate-v1.0.0
- [ ] Sync ArgoCD apps and verify services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/61
2026-01-25 21:35:57 -08:00
ea42362b6f Migrate Forgejo runner to Kubernetes with DinD (#60)
## Summary
- Deploy Forgejo runner to k8s with Docker-in-Docker sidecar
- Add job execution image with Node.js and Docker CLI
- Retire host-mode runner on indri
- All CI jobs now run containerized in k8s

## Components Added
- `containers/forgejo-runner/Dockerfile` - Job execution image
- `argocd/apps/forgejo-runner.yaml` - ArgoCD Application
- `argocd/manifests/forgejo-runner/` - Kubernetes manifests

## Components Removed
- `ansible/roles/forgejo_runner/` - No longer needed

## Changes to Existing Files
- `.forgejo/workflows/build-container.yaml` - Use `k8s` runner with `DOCKER_HOST` env
- `.github/actionlint.yaml` - Only `k8s` label now valid

## Deployment
1. Apply secret: `op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -`
2. Sync ArgoCD: `argocd app sync forgejo-runner`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/60
2026-01-25 19:56:17 -08:00
66badfafd1 Migrate k8s services to Caddy (*.ops.eblu.me) (#59)
All checks were successful
Build Container / build (push) Successful in 13s
nettest-v0.8.0
## Summary
- Add Caddy reverse proxy routes for all k8s services (grafana, argocd, prometheus, loki, miniflux, devpi, kiwix, torrent, teslamate)
- Add PostgreSQL via Caddy L4 TCP proxy on port 5432
- Caddy proxies to existing Tailscale endpoints - traffic stays local on indri
- Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs continue to work

## Updated References
- Alloy: prometheus/loki push endpoints → `*.ops.eblu.me`
- Borgmatic: PostgreSQL backup host → `pg.ops.eblu.me`
- Devpi: DEVPI_OUTSIDE_URL → `pypi.ops.eblu.me`
- indri-services-check: health check URLs
- CLAUDE.md: argocd login command

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy new Caddy config
- [ ] Test HTTP services: `curl https://grafana.ops.eblu.me/api/health`
- [ ] Test PostgreSQL: `pg_isready -h pg.ops.eblu.me -p 5432`
- [ ] Run `mise run provision-indri -- --tags alloy` to update Alloy endpoints
- [ ] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic
- [ ] Sync devpi in ArgoCD: `argocd app sync devpi`
- [ ] Re-login to ArgoCD: `argocd login argocd.ops.eblu.me ...`
- [ ] Run `mise run indri-services-check` to verify all services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/59
2026-01-25 12:56:31 -08:00
d6e6b48f6a Migrate registry to Caddy (registry.ops.eblu.me) (#58)
## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
2026-01-25 12:06:15 -08:00
9c1b7c7ca1 Update docs for Caddy migration (#57)
## Summary
- Update CLAUDE.md with new service routing documentation
- Document the two DNS domains: `*.ops.eblu.me` (Caddy) vs `*.tail8d86e.ts.net` (Tailscale)
- Fix incorrect service listings (Prometheus/Loki are in k8s, not indri)

## ZK Updates (not in this PR)
Also updated the blumeops zk card with:
- Source code URL (forge is primary, GitHub is mirror)
- Services split into Caddy vs Tailscale sections
- Updated port map for Caddy
- Updated "Adding a New Service" instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/57
2026-01-25 11:52:35 -08:00
1184b4de1d Add Caddy layer4 for Forgejo SSH (#56)
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin

## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube

This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.

## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints

## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.

## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56
2026-01-25 11:37:23 -08:00
682a68dc9c Add Caddy reverse proxy for blumeops services (#55)
## Summary
- Add Caddy ansible role following zot pattern (manual build, ansible deploy)
- Caddy built with Gandi DNS plugin for ACME DNS-01 challenges
- Gandi PAT fetched from 1Password and written to secured file on indri
- Configure wildcard TLS for `*.ops.eblu.me`
- Initial services: forge, registry (indri-local)
- Uses port 8443 during testing to avoid Tailscale serve conflicts

## Build Instructions (already done)
On indri:
```bash
cd ~/code/3rd/caddy && mise run build
```

## Deployment and Testing
- [ ] Review Caddyfile configuration
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy
- [ ] Test: `curl -v https://forge.ops.eblu.me:8443` (should get TLS cert)
- [ ] Test: `curl -v https://registry.ops.eblu.me:8443/v2/` (should return `{}`)
- [ ] Once verified, switch to port 443 and migrate services from Tailscale serve

## Files Changed
- `ansible/playbooks/indri.yml` - Add pre_task for Gandi PAT, add caddy role
- `ansible/roles/caddy/` - New role with Caddyfile and LaunchAgent templates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/55
2026-01-25 09:35:06 -08:00
b08faa50cc Add Gandi DNS management via Pulumi (#54)
## Summary
- Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/`
- Add Gandi LiveDNS management for `eblu.me` domain
- Create wildcard DNS record `*.ops.eblu.me` → indri's Tailscale IP (100.98.163.89)
- Add mise tasks: `dns-up`, `dns-preview`
- Update `tailnet-up` to pass `--yes` by default
- Document PAT cycling process (expires every 30 days)

## Background
This enables using real DNS names (`*.ops.eblu.me`) that resolve to Tailscale IPs,
which allows containers and other systems to resolve services without depending on
MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain
tailnet-only while using standard DNS.

## Deployment and Testing
- [ ] Run `cd pulumi/gandi && uv sync` to install dependencies
- [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack
- [ ] Run `mise run dns-preview` to verify configuration
- [ ] Run `mise run dns-up` to apply DNS records
- [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54
2026-01-25 08:15:46 -08:00
af3536bc17 Simplify indri IP extraction from tailscale status
Some checks failed
Build Container / build (push) Failing after 13s
nettest-v0.7.0
Use simple grep and awk to parse plain text tailscale status output
instead of trying to parse JSON. Also show the status output for
debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 20:04:28 -08:00
04fdd5906d Get indri's IP from tailscale status for registry access
Some checks failed
Build Container / build (push) Failing after 9s
nettest-v0.6.0
Use 'tailscale status' to get indri's Tailscale IP and add it to
/etc/hosts for registry hostname resolution. The registry service
runs on indri, so we need indri's IP specifically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 20:00:57 -08:00
16ccabbc34 Resolve registry IP via Tailscale and add to /etc/hosts
Some checks failed
Build Container / build (push) Failing after 8s
nettest-v0.5.0
The Tailscale container's DNS doesn't work because it runs in userspace
mode. Instead, resolve the registry IP using 'tailscale ip' and add it
to /etc/hosts inside the container before running skopeo.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:44:39 -08:00
b4b8abb2d9 Add tag:ci-gateway to Tailscale ACL for CI builds
Some checks failed
Build Container / build (push) Failing after 13s
nettest-v0.4.0
- Add tag:ci-gateway to tagOwners
- Grant ci-gateway access to registry on port 443
- Add test for ci-gateway -> registry access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:38:02 -08:00
9f98b3007e Add debugging for Tailscale container failures
Some checks failed
Build Container / build (push) Failing after 7s
nettest-v0.3.0
Capture container logs when the Tailscale sidecar exits unexpectedly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:32:40 -08:00
424647cd93 Use Tailscale sidecar for container registry push
Some checks failed
Build Container / build (push) Failing after 1m9s
nettest-v0.2.0
Docker Desktop's VM can't resolve tailnet hostnames. Work around this by:
1. Starting a Tailscale container that joins the tailnet
2. Building the image with docker build
3. Saving to tarball with docker save
4. Pushing via skopeo inside the Tailscale container

Uses TS_CI_GATEWAY_AUTHKEY repository secret for authentication.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:29:01 -08:00
b598ea8d5a Add indri-runner-logs script to fetch workflow logs
Fetches logs for Forgejo Actions runs from indri's local storage.
Logs are stored as zstd-compressed files in the forgejo data directory.

Usage: mise run indri-runner-logs <run_id>

Only works for runs executed by the indri-host-runner.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:25:32 -08:00
31697b4d63 Add nettest container for CI/CD network debugging (#52)
Some checks failed
Build Container / build (push) Failing after 18s
nettest-v0.1.0
## Summary
- Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script
- Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags
- Test script checks DNS resolution and HTTPS connectivity to forge and registry

## Deployment and Testing
- [ ] Merge PR to main
- [ ] Run `mise run container-release nettest v0.1.0` to trigger first build
- [ ] Verify workflow runs successfully and container can reach tailnet services
- [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52
2026-01-24 16:54:35 -08:00
13b8d16eb2 Remove mention of plans dir
All checks were successful
Test CI / test (push) Successful in 3s
2026-01-24 16:22:39 -08:00