blumeops/docs/explanation/architecture.md
Erich Blume e6cf7e47e0
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00

151 lines
5.9 KiB
Markdown

---
title: Architecture
tags:
- explanation
- architecture
---
# Architecture Overview
> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
How all the BlumeOps pieces fit together.
## Physical Layer
Two always-on devices form the infrastructure backbone:
```
┌─────────────────┐ ┌─────────────────┐
│ Indri │ │ Sifaka │
│ Mac Mini M1 │────▶│ Synology NAS │
│ (compute) │ │ (storage) │
└─────────────────┘ └─────────────────┘
│ Tailscale
┌─────────────────┐
│ Gilbert │
│ MacBook Air │
│ (workstation) │
└─────────────────┘
```
- **[[indri]]** runs all services (native and containerized)
- **[[sifaka]]** provides bulk storage and backup targets
- **[[gilbert]]** is the development workstation
## Network Layer
[[tailscale]] provides the network fabric:
- All devices on tailnet `tail8d86e.ts.net`
- ACLs control access between devices and services
- MagicDNS provides `*.tail8d86e.ts.net` hostnames
- No port forwarding or public IPs on homelab devices
- Selected services exposed publicly via [[flyio-proxy]] (Fly.io → Tailscale tunnel)
## Service Routing
Three DNS domains route to services:
| Domain | Mechanism | Reachable from |
|--------|-----------|----------------|
| `*.eblu.me` | [[flyio-proxy]] (Fly.io → Tailscale tunnel) | Public internet |
| `*.ops.eblu.me` | Caddy reverse proxy on indri | k8s pods, containers, tailnet clients |
| `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
See [[routing]] for details on when to use which.
## Compute Layer
Services run in two places:
### Native on Indri (Ansible)
Some services run directly on macOS:
- [[forgejo]] - Git forge (needs filesystem access)
- [[zot]] - Container registry (k8s depends on it)
- [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
- [[borgmatic]] - Backups (needs host filesystem access)
Managed via Ansible roles in `ansible/roles/`.
### Kubernetes (ArgoCD)
Most services run in minikube on indri:
- [[grafana]], [[prometheus]], [[loki]] - Observability
- [[miniflux]], [[navidrome]], [[kiwix]] - Applications
- [[postgresql]] - Shared database (CloudNativePG)
Managed via ArgoCD from `argocd/manifests/`.
## Data Flow
```
┌──────────────┐
│ Git Repo │
│ (Forgejo) │
└──────┬───────┘
│ push
┌──────────────┐ ┌──────────────┐
│ ArgoCD │────▶│ Kubernetes │
│ (watches) │sync │ (runs) │
└──────────────┘ └──────────────┘
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Service │ │ Service │ │ Service │
└──────────────┘ └──────────────┘ └──────────────┘
```
1. Code pushed to [[forgejo]]
2. [[argocd]] detects changes (or manual sync triggered)
3. ArgoCD applies manifests to cluster
4. Services start/update in Kubernetes
## Observability
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Alloy │────▶│ Prometheus │────▶│ Grafana │
│ (collector) │ │ (metrics) │ │ (dashboards)│
└─────────────┘ └─────────────┘ └─────────────┘
│ ▲
│ ┌─────────────┐ │
└───────────▶│ Loki │────────────┘
│ (logs) │
└─────────────┘
```
[[alloy]] runs in two places:
- On indri: collects host metrics and logs
- In k8s: collects pod logs and service probes
See [[observability]] for details.
## Secrets Flow
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 1Password │────▶│ 1Password │────▶│ External │
│ (vault) │ │ Connect │ │ Secrets │
└─────────────┘ └─────────────┘ └─────────────┘
┌─────────────┐
│ K8s Secret │
└─────────────┘
```
Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
## Related
- [[why-gitops]] - Philosophy behind this approach
- [[security-model]] - Access control and secrets
- [[routing]] - Service routing details