blumeops/docs/explanation/architecture.md
Erich Blume e6cf7e47e0
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00

5.9 KiB

title tags
Architecture
explanation
architecture

Architecture Overview

Note: This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.

How all the BlumeOps pieces fit together.

Physical Layer

Two always-on devices form the infrastructure backbone:

┌─────────────────┐     ┌─────────────────┐
│     Indri       │     │     Sifaka      │
│  Mac Mini M1    │────▶│  Synology NAS   │
│  (compute)      │     │  (storage)      │
└─────────────────┘     └─────────────────┘
        │
        │ Tailscale
        ▼
┌─────────────────┐
│    Gilbert      │
│  MacBook Air    │
│  (workstation)  │
└─────────────────┘
  • indri runs all services (native and containerized)
  • sifaka provides bulk storage and backup targets
  • gilbert is the development workstation

Network Layer

tailscale provides the network fabric:

  • All devices on tailnet tail8d86e.ts.net
  • ACLs control access between devices and services
  • MagicDNS provides *.tail8d86e.ts.net hostnames
  • No port forwarding or public IPs on homelab devices
  • Selected services exposed publicly via flyio-proxy (Fly.io → Tailscale tunnel)

Service Routing

Three DNS domains route to services:

Domain Mechanism Reachable from
*.eblu.me flyio-proxy (Fly.io → Tailscale tunnel) Public internet
*.ops.eblu.me Caddy reverse proxy on indri k8s pods, containers, tailnet clients
*.tail8d86e.ts.net Tailscale MagicDNS Tailnet clients only

See routing for details on when to use which.

Compute Layer

Services run in two places:

Native on Indri (Ansible)

Some services run directly on macOS:

  • forgejo - Git forge (needs filesystem access)
  • zot - Container registry (k8s depends on it)
  • jellyfin - Media server (needs VideoToolbox hardware transcoding)
  • borgmatic - Backups (needs host filesystem access)

Managed via Ansible roles in ansible/roles/.

Kubernetes (ArgoCD)

Most services run in minikube on indri:

Managed via ArgoCD from argocd/manifests/.

Data Flow

┌──────────────┐
│   Git Repo   │
│  (Forgejo)   │
└──────┬───────┘
       │ push
       ▼
┌──────────────┐     ┌──────────────┐
│   ArgoCD     │────▶│  Kubernetes  │
│  (watches)   │sync │   (runs)     │
└──────────────┘     └──────────────┘
                            │
       ┌────────────────────┼────────────────────┐
       ▼                    ▼                    ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Service    │     │   Service    │     │   Service    │
└──────────────┘     └──────────────┘     └──────────────┘
  1. Code pushed to forgejo
  2. argocd detects changes (or manual sync triggered)
  3. ArgoCD applies manifests to cluster
  4. Services start/update in Kubernetes

Observability

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Alloy     │────▶│ Prometheus  │────▶│   Grafana   │
│ (collector) │     │  (metrics)  │     │ (dashboards)│
└─────────────┘     └─────────────┘     └─────────────┘
       │                                       ▲
       │            ┌─────────────┐            │
       └───────────▶│    Loki     │────────────┘
                    │   (logs)    │
                    └─────────────┘

alloy runs in two places:

  • On indri: collects host metrics and logs
  • In k8s: collects pod logs and service probes

See observability for details.

Secrets Flow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  1Password  │────▶│  1Password  │────▶│   External  │
│   (vault)   │     │   Connect   │     │   Secrets   │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │  K8s Secret │
                                        └─────────────┘

Secrets live in 1Password and flow to Kubernetes via external-secrets.

For Ansible, secrets are fetched via op CLI in playbook pre_tasks.