Add Phase 5: explanation documentation
Understanding-oriented content explaining the "why" behind BlumeOps: - why-gitops: Philosophy of infrastructure-as-code for homelabs - architecture: How all the pieces fit together (hosts, services, data flow) - security-model: Tailscale networking, 1Password secrets, access control Also updates docs/index.md with How-to and Explanation sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
e426473c59
commit
7be34dca60
8 changed files with 390 additions and 10 deletions
|
|
@ -111,15 +111,17 @@ Task-oriented instructions for specific operations.
|
|||
|
||||
**How-to URL:** https://docs.ops.eblu.me/how-to/
|
||||
|
||||
### Phase 5: Explanation
|
||||
### Phase 5: Explanation (Complete)
|
||||
Understanding-oriented discussion of concepts and decisions.
|
||||
|
||||
- [ ] Create `explanation/` directory
|
||||
- [ ] "Why GitOps?" - Philosophy and approach
|
||||
- [ ] "Architecture Overview" - How everything fits together
|
||||
- [ ] "Security Model" - Tailscale, secrets management, etc.
|
||||
- [ ] "Decision Log" - ADRs (Architecture Decision Records)
|
||||
- [ ] Update `exploring-the-docs` with Explanation section
|
||||
- [x] Create `explanation/` directory
|
||||
- [x] "Why GitOps?" - Philosophy and approach
|
||||
- [x] "Architecture Overview" - How everything fits together
|
||||
- [x] "Security Model" - Tailscale, secrets management, etc.
|
||||
- [ ] "Decision Log" - ADRs (Architecture Decision Records) - deferred
|
||||
- [x] Update `exploring-the-docs` with Explanation section
|
||||
|
||||
**Explanation URL:** https://docs.ops.eblu.me/explanation/
|
||||
|
||||
### Phase 6: Integration & Cleanup
|
||||
- [ ] Migrate remaining useful content from `docs/zk/`
|
||||
|
|
|
|||
1
docs/changelog.d/phase5-explanation.doc.md
Normal file
1
docs/changelog.d/phase5-explanation.doc.md
Normal file
|
|
@ -0,0 +1 @@
|
|||
Add Phase 5 explanation docs: why GitOps, architecture overview, and security model
|
||||
147
docs/explanation/architecture.md
Normal file
147
docs/explanation/architecture.md
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
---
|
||||
title: architecture
|
||||
tags:
|
||||
- explanation
|
||||
- architecture
|
||||
---
|
||||
|
||||
# Architecture Overview
|
||||
|
||||
How all the BlumeOps pieces fit together.
|
||||
|
||||
## Physical Layer
|
||||
|
||||
Two always-on devices form the infrastructure backbone:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Indri │ │ Sifaka │
|
||||
│ Mac Mini M1 │────▶│ Synology NAS │
|
||||
│ (compute) │ │ (storage) │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
│
|
||||
│ Tailscale
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Gilbert │
|
||||
│ MacBook Air │
|
||||
│ (workstation) │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
- **[[indri]]** runs all services (native and containerized)
|
||||
- **[[sifaka]]** provides bulk storage and backup targets
|
||||
- **[[gilbert]]** is the development workstation
|
||||
|
||||
## Network Layer
|
||||
|
||||
[[tailscale]] provides the network fabric:
|
||||
|
||||
- All devices on tailnet `tail8d86e.ts.net`
|
||||
- ACLs control access between devices and services
|
||||
- MagicDNS provides `*.tail8d86e.ts.net` hostnames
|
||||
- No port forwarding or public IPs needed
|
||||
|
||||
## Service Routing
|
||||
|
||||
Two DNS domains route to services:
|
||||
|
||||
| Domain | Mechanism | Reachable from |
|
||||
|--------|-----------|----------------|
|
||||
| `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) |
|
||||
| `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
|
||||
|
||||
See [[routing]] for details on when to use which.
|
||||
|
||||
## Compute Layer
|
||||
|
||||
Services run in two places:
|
||||
|
||||
### Native on Indri (Ansible)
|
||||
|
||||
Some services run directly on macOS:
|
||||
- [[forgejo]] - Git forge (needs filesystem access)
|
||||
- [[zot]] - Container registry (k8s depends on it)
|
||||
- [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
|
||||
- [[borgmatic]] - Backups (needs host filesystem access)
|
||||
|
||||
Managed via Ansible roles in `ansible/roles/`.
|
||||
|
||||
### Kubernetes (ArgoCD)
|
||||
|
||||
Most services run in minikube on indri:
|
||||
- [[grafana]], [[prometheus]], [[loki]] - Observability
|
||||
- [[miniflux]], [[navidrome]], [[kiwix]] - Applications
|
||||
- [[postgresql]] - Shared database (CloudNativePG)
|
||||
|
||||
Managed via ArgoCD from `argocd/manifests/`.
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ Git Repo │
|
||||
│ (Forgejo) │
|
||||
└──────┬───────┘
|
||||
│ push
|
||||
▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ ArgoCD │────▶│ Kubernetes │
|
||||
│ (watches) │sync │ (runs) │
|
||||
└──────────────┘ └──────────────┘
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Service │ │ Service │ │ Service │
|
||||
└──────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
1. Code pushed to [[forgejo]]
|
||||
2. [[argocd]] detects changes (or manual sync triggered)
|
||||
3. ArgoCD applies manifests to cluster
|
||||
4. Services start/update in Kubernetes
|
||||
|
||||
## Observability
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Alloy │────▶│ Prometheus │────▶│ Grafana │
|
||||
│ (collector) │ │ (metrics) │ │ (dashboards)│
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ ▲
|
||||
│ ┌─────────────┐ │
|
||||
└───────────▶│ Loki │────────────┘
|
||||
│ (logs) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
[[alloy]] runs in two places:
|
||||
- On indri: collects host metrics and logs
|
||||
- In k8s: collects pod logs and service probes
|
||||
|
||||
See [[observability]] for details.
|
||||
|
||||
## Secrets Flow
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ 1Password │────▶│ 1Password │────▶│ External │
|
||||
│ (vault) │ │ Connect │ │ Secrets │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ K8s Secret │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
|
||||
|
||||
For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
|
||||
|
||||
## Related
|
||||
|
||||
- [[why-gitops]] - Philosophy behind this approach
|
||||
- [[security-model]] - Access control and secrets
|
||||
- [[routing]] - Service routing details
|
||||
22
docs/explanation/index.md
Normal file
22
docs/explanation/index.md
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
title: explanation
|
||||
tags:
|
||||
- explanation
|
||||
---
|
||||
|
||||
# Explanation
|
||||
|
||||
Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
|
||||
|
||||
## Philosophy
|
||||
|
||||
| Article | Description |
|
||||
|---------|-------------|
|
||||
| [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
|
||||
|
||||
## Design
|
||||
|
||||
| Article | Description |
|
||||
|---------|-------------|
|
||||
| [[architecture]] | How all the pieces fit together |
|
||||
| [[security-model]] | Network security, secrets, and access control |
|
||||
137
docs/explanation/security-model.md
Normal file
137
docs/explanation/security-model.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
---
|
||||
title: security-model
|
||||
tags:
|
||||
- explanation
|
||||
- security
|
||||
---
|
||||
|
||||
# Security Model
|
||||
|
||||
How BlumeOps handles network security, secrets, and access control.
|
||||
|
||||
## Network Security: Tailscale
|
||||
|
||||
The foundational security decision is using [[tailscale]] as the network layer.
|
||||
|
||||
### Zero Trust Networking
|
||||
|
||||
BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale:
|
||||
|
||||
- **No attack surface** from the public internet
|
||||
- **Encrypted by default** - WireGuard encryption for all traffic
|
||||
- **Identity-based access** - ACLs based on user/device identity, not IP addresses
|
||||
|
||||
### Defense in Depth
|
||||
|
||||
Even within the tailnet, access is restricted:
|
||||
|
||||
```
|
||||
Internet ──X──▶ Services (no public access)
|
||||
|
||||
Tailnet:
|
||||
Admin ────────▶ All services
|
||||
Member ───────▶ User-facing services only
|
||||
Homelab tag ──▶ NAS (for backups)
|
||||
```
|
||||
|
||||
See [[tailscale]] for the full ACL matrix.
|
||||
|
||||
## Secrets Management
|
||||
|
||||
Secrets follow a hierarchy:
|
||||
|
||||
### Source of Truth: 1Password
|
||||
|
||||
All secrets originate in 1Password's `blumeops` vault:
|
||||
- API keys, tokens, passwords
|
||||
- SSH keys and certificates
|
||||
- OAuth credentials
|
||||
|
||||
### Kubernetes: External Secrets Operator
|
||||
|
||||
[[external-secrets]] syncs secrets from 1Password to Kubernetes:
|
||||
|
||||
```
|
||||
1Password ──▶ 1Password Connect ──▶ ExternalSecret ──▶ K8s Secret
|
||||
```
|
||||
|
||||
Services reference native Kubernetes Secrets; they don't know about 1Password.
|
||||
|
||||
### Ansible: op CLI
|
||||
|
||||
Ansible playbooks fetch secrets at runtime via `op` CLI:
|
||||
|
||||
```yaml
|
||||
- name: Fetch secret
|
||||
command: op item get <id> --fields password --reveal
|
||||
delegate_to: localhost
|
||||
```
|
||||
|
||||
Secrets are held in memory as Ansible facts, never written to disk.
|
||||
|
||||
### Git Repository
|
||||
|
||||
The repository is public. Secrets must never be committed:
|
||||
- `.gitignore` excludes sensitive patterns
|
||||
- Pre-commit hooks scan for potential secrets (TruffleHog)
|
||||
- All config files use references to secrets, not values
|
||||
|
||||
## Access Control Philosophy
|
||||
|
||||
### Principle of Least Privilege
|
||||
|
||||
Services and devices get minimum necessary access:
|
||||
|
||||
| Entity | Access |
|
||||
|--------|--------|
|
||||
| Admin users | Everything |
|
||||
| Member users | User-facing services only |
|
||||
| Homelab servers | Only what they need (NAS for backups) |
|
||||
| K8s pods | No Tailscale access (use Caddy proxy) |
|
||||
|
||||
### Tagged Devices vs User Devices
|
||||
|
||||
Important Tailscale concept:
|
||||
- **User devices** (like gilbert) have user identity and inherit user ACLs
|
||||
- **Tagged devices** (like indri with `tag:homelab`) lose user identity
|
||||
|
||||
Don't tag user devices - it breaks user-based access rules.
|
||||
|
||||
## Authentication Patterns
|
||||
|
||||
### Service-to-Service
|
||||
|
||||
Internal services use:
|
||||
- Kubernetes service discovery (no auth needed within cluster)
|
||||
- Tailscale identity for cross-host communication
|
||||
|
||||
### User-to-Service
|
||||
|
||||
Users authenticate via:
|
||||
- Service-specific credentials (stored in 1Password)
|
||||
- Some services support Tailscale identity (future)
|
||||
|
||||
### AI/Automation Access
|
||||
|
||||
Claude Code and automation use:
|
||||
- SSH keys for git operations
|
||||
- ArgoCD tokens for deployments
|
||||
- 1Password CLI for secret retrieval (requires user approval)
|
||||
|
||||
## What's Not Protected
|
||||
|
||||
Honest assessment of security boundaries:
|
||||
|
||||
- **Local network attacks** - If someone is on your home WiFi, they could potentially access the NAS directly
|
||||
- **Physical access** - No disk encryption on servers (trade-off for reliability)
|
||||
- **Supply chain** - Container images from upstream registries
|
||||
- **Operator error** - Misconfigured ACLs or leaked credentials
|
||||
|
||||
The model assumes a trusted home network and focuses on protecting against internet-based attacks.
|
||||
|
||||
## Related
|
||||
|
||||
- [[tailscale]] - ACL configuration
|
||||
- [[1password]] - Secrets management
|
||||
- [[external-secrets]] - Kubernetes secrets
|
||||
- [[architecture]] - Overall system design
|
||||
68
docs/explanation/why-gitops.md
Normal file
68
docs/explanation/why-gitops.md
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
---
|
||||
title: why-gitops
|
||||
tags:
|
||||
- explanation
|
||||
- philosophy
|
||||
---
|
||||
|
||||
# Why GitOps?
|
||||
|
||||
BlumeOps uses GitOps principles for managing personal infrastructure. This might seem like overkill for a homelab, but there are good reasons.
|
||||
|
||||
## The Problem with Manual Infrastructure
|
||||
|
||||
Traditional server management involves SSHing into machines and running commands. This works, but creates problems:
|
||||
|
||||
- **Drift**: The actual state diverges from what you think it is
|
||||
- **Amnesia**: You forget what you changed and why
|
||||
- **Fragility**: One bad command can break things with no easy rollback
|
||||
- **Bus factor**: Only you know how it works (even AI assistants struggle without context)
|
||||
|
||||
## Git as the Source of Truth
|
||||
|
||||
GitOps inverts the model: instead of pushing changes to servers, you commit desired state to Git, and automation pulls it into reality.
|
||||
|
||||
**Benefits:**
|
||||
- Every change is tracked with commit history
|
||||
- Pull requests enable review before deployment
|
||||
- Rollback is just `git revert`
|
||||
- The repo *is* the documentation
|
||||
|
||||
## Why This Matters for a Homelab
|
||||
|
||||
A personal homelab isn't a production environment, but it shares the same challenges:
|
||||
|
||||
1. **Memory is unreliable** - Six months from now, you won't remember why you configured Caddy that way
|
||||
2. **Experimentation is constant** - You try things, break things, want to undo things
|
||||
3. **AI assistance needs context** - Claude can help much more effectively when it can read your infrastructure as code
|
||||
|
||||
## The BlumeOps Approach
|
||||
|
||||
BlumeOps uses layered GitOps:
|
||||
|
||||
| Layer | Tool | What it manages |
|
||||
|-------|------|-----------------|
|
||||
| **Tailnet** | [[reference/infrastructure/tailscale|Pulumi]] | ACLs, tags, DNS |
|
||||
| **Host config** | [[reference/ansible/roles|Ansible]] | Services on [[indri]] |
|
||||
| **Kubernetes** | [[argocd|ArgoCD]] | Containerized workloads |
|
||||
|
||||
Each layer has its own reconciliation loop:
|
||||
- Pulumi applies on `mise run tailnet-up`
|
||||
- Ansible applies on `mise run provision-indri`
|
||||
- ArgoCD watches Git and syncs manually or automatically
|
||||
|
||||
## Trade-offs
|
||||
|
||||
GitOps isn't free:
|
||||
|
||||
- **Learning curve** - You need to understand Ansible, ArgoCD, Pulumi
|
||||
- **Indirection** - Can't just `apt install` something; need to add it to config
|
||||
- **Complexity** - More moving parts than a simple server
|
||||
|
||||
But for BlumeOps, the trade-off is worth it. The infrastructure is complex enough that managing it imperatively would be error-prone, and the GitOps approach enables effective AI-assisted operations.
|
||||
|
||||
## Related
|
||||
|
||||
- [[architecture]] - How the pieces fit together
|
||||
- [[argocd]] - Kubernetes GitOps
|
||||
- [[reference/ansible/roles|Ansible roles]] - Host configuration
|
||||
|
|
@ -9,7 +9,9 @@ Welcome to the BlumeOps documentation.
|
|||
## Sections
|
||||
|
||||
- [[tutorials/index | Tutorials]] - Learning-oriented guides for getting started
|
||||
- [[reference/index | Reference]] - Technical reference cards for services, infrastructure, and operations
|
||||
- [[reference/index | Reference]] - Technical specifications and service details
|
||||
- [[how-to/index | How-to]] - Task-oriented instructions for common operations
|
||||
- [[explanation/index | Explanation]] - Understanding the "why" behind BlumeOps
|
||||
|
||||
## About
|
||||
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework:
|
|||
| **[[tutorials/index | Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
|
||||
| **[[reference/index | Reference]]** | Information-oriented | "I need specific technical details" |
|
||||
| **[[how-to/index | How-to]]** | Task-oriented | "I need to do X" |
|
||||
| **Explanation** (planned) | Understanding-oriented | "I want to understand why" |
|
||||
| **[[explanation/index | Explanation]]** | Understanding-oriented | "I want to understand why" |
|
||||
|
||||
## Quick Paths by Audience
|
||||
|
||||
|
|
@ -42,9 +42,9 @@ Context for effective assistance:
|
|||
### For External Readers
|
||||
|
||||
Understanding what this is:
|
||||
- [[explanation/index|Explanation]] covers the "why" behind design decisions
|
||||
- [[reference/index|Reference]] shows what's actually running
|
||||
- Browse service pages to see specific implementations
|
||||
- The repo's README has project context
|
||||
|
||||
### For Contributors
|
||||
|
||||
|
|
@ -58,6 +58,7 @@ Getting started with changes:
|
|||
Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.
|
||||
|
||||
- [[replicating-blumeops]] provides the overview
|
||||
- [[explanation/index|Explanation]] covers architecture and design rationale
|
||||
- The `replication/` tutorials go deep on components
|
||||
- Reference pages show specific configuration choices
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue