Add Phase 5: explanation documentation (#96)
## Summary - Create `docs/explanation/` directory with index and three explanation articles - why-gitops: Philosophy of GitOps for homelabs (memory, rollback, AI context) - architecture: How pieces fit together (ASCII diagrams of hosts, data flow, secrets) - security-model: Tailscale zero-trust, 1Password secrets, access control philosophy - Update docs/index.md with How-to and Explanation section links - Update exploring-the-docs to link Explanation section Decision log deferred to future work. ## Deployment and Testing - [x] Pre-commit hooks pass (including doc-links validator) - [ ] Build and deploy to docs.ops.eblu.me to verify rendering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/96
This commit is contained in:
parent
e426473c59
commit
0a28622751
8 changed files with 396 additions and 10 deletions
|
|
@ -111,15 +111,17 @@ Task-oriented instructions for specific operations.
|
||||||
|
|
||||||
**How-to URL:** https://docs.ops.eblu.me/how-to/
|
**How-to URL:** https://docs.ops.eblu.me/how-to/
|
||||||
|
|
||||||
### Phase 5: Explanation
|
### Phase 5: Explanation (Complete)
|
||||||
Understanding-oriented discussion of concepts and decisions.
|
Understanding-oriented discussion of concepts and decisions.
|
||||||
|
|
||||||
- [ ] Create `explanation/` directory
|
- [x] Create `explanation/` directory
|
||||||
- [ ] "Why GitOps?" - Philosophy and approach
|
- [x] "Why GitOps?" - Philosophy and approach
|
||||||
- [ ] "Architecture Overview" - How everything fits together
|
- [x] "Architecture Overview" - How everything fits together
|
||||||
- [ ] "Security Model" - Tailscale, secrets management, etc.
|
- [x] "Security Model" - Tailscale, secrets management, etc.
|
||||||
- [ ] "Decision Log" - ADRs (Architecture Decision Records)
|
- [ ] "Decision Log" - ADRs (Architecture Decision Records) - deferred
|
||||||
- [ ] Update `exploring-the-docs` with Explanation section
|
- [x] Update `exploring-the-docs` with Explanation section
|
||||||
|
|
||||||
|
**Explanation URL:** https://docs.ops.eblu.me/explanation/
|
||||||
|
|
||||||
### Phase 6: Integration & Cleanup
|
### Phase 6: Integration & Cleanup
|
||||||
- [ ] Migrate remaining useful content from `docs/zk/`
|
- [ ] Migrate remaining useful content from `docs/zk/`
|
||||||
|
|
|
||||||
1
docs/changelog.d/phase5-explanation.doc.md
Normal file
1
docs/changelog.d/phase5-explanation.doc.md
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
Add Phase 5 explanation docs: why GitOps, architecture overview, and security model
|
||||||
149
docs/explanation/architecture.md
Normal file
149
docs/explanation/architecture.md
Normal file
|
|
@ -0,0 +1,149 @@
|
||||||
|
---
|
||||||
|
title: architecture
|
||||||
|
tags:
|
||||||
|
- explanation
|
||||||
|
- architecture
|
||||||
|
---
|
||||||
|
|
||||||
|
# Architecture Overview
|
||||||
|
|
||||||
|
> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
|
||||||
|
|
||||||
|
How all the BlumeOps pieces fit together.
|
||||||
|
|
||||||
|
## Physical Layer
|
||||||
|
|
||||||
|
Two always-on devices form the infrastructure backbone:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ Indri │ │ Sifaka │
|
||||||
|
│ Mac Mini M1 │────▶│ Synology NAS │
|
||||||
|
│ (compute) │ │ (storage) │
|
||||||
|
└─────────────────┘ └─────────────────┘
|
||||||
|
│
|
||||||
|
│ Tailscale
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Gilbert │
|
||||||
|
│ MacBook Air │
|
||||||
|
│ (workstation) │
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- **[[indri]]** runs all services (native and containerized)
|
||||||
|
- **[[sifaka]]** provides bulk storage and backup targets
|
||||||
|
- **[[gilbert]]** is the development workstation
|
||||||
|
|
||||||
|
## Network Layer
|
||||||
|
|
||||||
|
[[tailscale]] provides the network fabric:
|
||||||
|
|
||||||
|
- All devices on tailnet `tail8d86e.ts.net`
|
||||||
|
- ACLs control access between devices and services
|
||||||
|
- MagicDNS provides `*.tail8d86e.ts.net` hostnames
|
||||||
|
- No port forwarding or public IPs needed
|
||||||
|
|
||||||
|
## Service Routing
|
||||||
|
|
||||||
|
Two DNS domains route to services:
|
||||||
|
|
||||||
|
| Domain | Mechanism | Reachable from |
|
||||||
|
|--------|-----------|----------------|
|
||||||
|
| `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) |
|
||||||
|
| `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
|
||||||
|
|
||||||
|
See [[routing]] for details on when to use which.
|
||||||
|
|
||||||
|
## Compute Layer
|
||||||
|
|
||||||
|
Services run in two places:
|
||||||
|
|
||||||
|
### Native on Indri (Ansible)
|
||||||
|
|
||||||
|
Some services run directly on macOS:
|
||||||
|
- [[forgejo]] - Git forge (needs filesystem access)
|
||||||
|
- [[zot]] - Container registry (k8s depends on it)
|
||||||
|
- [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
|
||||||
|
- [[borgmatic]] - Backups (needs host filesystem access)
|
||||||
|
|
||||||
|
Managed via Ansible roles in `ansible/roles/`.
|
||||||
|
|
||||||
|
### Kubernetes (ArgoCD)
|
||||||
|
|
||||||
|
Most services run in minikube on indri:
|
||||||
|
- [[grafana]], [[prometheus]], [[loki]] - Observability
|
||||||
|
- [[miniflux]], [[navidrome]], [[kiwix]] - Applications
|
||||||
|
- [[postgresql]] - Shared database (CloudNativePG)
|
||||||
|
|
||||||
|
Managed via ArgoCD from `argocd/manifests/`.
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐
|
||||||
|
│ Git Repo │
|
||||||
|
│ (Forgejo) │
|
||||||
|
└──────┬───────┘
|
||||||
|
│ push
|
||||||
|
▼
|
||||||
|
┌──────────────┐ ┌──────────────┐
|
||||||
|
│ ArgoCD │────▶│ Kubernetes │
|
||||||
|
│ (watches) │sync │ (runs) │
|
||||||
|
└──────────────┘ └──────────────┘
|
||||||
|
│
|
||||||
|
┌────────────────────┼────────────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||||
|
│ Service │ │ Service │ │ Service │
|
||||||
|
└──────────────┘ └──────────────┘ └──────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Code pushed to [[forgejo]]
|
||||||
|
2. [[argocd]] detects changes (or manual sync triggered)
|
||||||
|
3. ArgoCD applies manifests to cluster
|
||||||
|
4. Services start/update in Kubernetes
|
||||||
|
|
||||||
|
## Observability
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||||
|
│ Alloy │────▶│ Prometheus │────▶│ Grafana │
|
||||||
|
│ (collector) │ │ (metrics) │ │ (dashboards)│
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
│ ▲
|
||||||
|
│ ┌─────────────┐ │
|
||||||
|
└───────────▶│ Loki │────────────┘
|
||||||
|
│ (logs) │
|
||||||
|
└─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
[[alloy]] runs in two places:
|
||||||
|
- On indri: collects host metrics and logs
|
||||||
|
- In k8s: collects pod logs and service probes
|
||||||
|
|
||||||
|
See [[observability]] for details.
|
||||||
|
|
||||||
|
## Secrets Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||||
|
│ 1Password │────▶│ 1Password │────▶│ External │
|
||||||
|
│ (vault) │ │ Connect │ │ Secrets │
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ K8s Secret │
|
||||||
|
└─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
|
||||||
|
|
||||||
|
For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [[why-gitops]] - Philosophy behind this approach
|
||||||
|
- [[security-model]] - Access control and secrets
|
||||||
|
- [[routing]] - Service routing details
|
||||||
22
docs/explanation/index.md
Normal file
22
docs/explanation/index.md
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
---
|
||||||
|
title: explanation
|
||||||
|
tags:
|
||||||
|
- explanation
|
||||||
|
---
|
||||||
|
|
||||||
|
# Explanation
|
||||||
|
|
||||||
|
Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
|
||||||
|
|
||||||
|
## Philosophy
|
||||||
|
|
||||||
|
| Article | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
| Article | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| [[architecture]] | How all the pieces fit together |
|
||||||
|
| [[security-model]] | Network security, secrets, and access control |
|
||||||
139
docs/explanation/security-model.md
Normal file
139
docs/explanation/security-model.md
Normal file
|
|
@ -0,0 +1,139 @@
|
||||||
|
---
|
||||||
|
title: security-model
|
||||||
|
tags:
|
||||||
|
- explanation
|
||||||
|
- security
|
||||||
|
---
|
||||||
|
|
||||||
|
# Security Model
|
||||||
|
|
||||||
|
> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
|
||||||
|
|
||||||
|
How BlumeOps handles network security, secrets, and access control.
|
||||||
|
|
||||||
|
## Network Security: Tailscale
|
||||||
|
|
||||||
|
The foundational security decision is using [[tailscale]] as the network layer.
|
||||||
|
|
||||||
|
### Zero Trust Networking
|
||||||
|
|
||||||
|
BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale:
|
||||||
|
|
||||||
|
- **No attack surface** from the public internet
|
||||||
|
- **Encrypted by default** - WireGuard encryption for all traffic
|
||||||
|
- **Identity-based access** - ACLs based on user/device identity, not IP addresses
|
||||||
|
|
||||||
|
### Defense in Depth
|
||||||
|
|
||||||
|
Even within the tailnet, access is restricted:
|
||||||
|
|
||||||
|
```
|
||||||
|
Internet ──X──▶ Services (no public access)
|
||||||
|
|
||||||
|
Tailnet:
|
||||||
|
Admin ────────▶ All services
|
||||||
|
Member ───────▶ User-facing services only
|
||||||
|
Homelab tag ──▶ NAS (for backups)
|
||||||
|
```
|
||||||
|
|
||||||
|
See [[tailscale]] for the full ACL matrix.
|
||||||
|
|
||||||
|
## Secrets Management
|
||||||
|
|
||||||
|
Secrets follow a hierarchy:
|
||||||
|
|
||||||
|
### Source of Truth: 1Password
|
||||||
|
|
||||||
|
All secrets originate in 1Password's `blumeops` vault:
|
||||||
|
- API keys, tokens, passwords
|
||||||
|
- SSH keys and certificates
|
||||||
|
- OAuth credentials
|
||||||
|
|
||||||
|
### Kubernetes: External Secrets Operator
|
||||||
|
|
||||||
|
[[external-secrets]] syncs secrets from 1Password to Kubernetes:
|
||||||
|
|
||||||
|
```
|
||||||
|
1Password ──▶ 1Password Connect ──▶ ExternalSecret ──▶ K8s Secret
|
||||||
|
```
|
||||||
|
|
||||||
|
Services reference native Kubernetes Secrets; they don't know about 1Password.
|
||||||
|
|
||||||
|
### Ansible: op CLI
|
||||||
|
|
||||||
|
Ansible playbooks fetch secrets at runtime via `op` CLI:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Fetch secret
|
||||||
|
command: op item get <id> --fields password --reveal
|
||||||
|
delegate_to: localhost
|
||||||
|
```
|
||||||
|
|
||||||
|
Secrets are held in memory as Ansible facts, never written to disk.
|
||||||
|
|
||||||
|
### Git Repository
|
||||||
|
|
||||||
|
The repository is public. Secrets must never be committed:
|
||||||
|
- `.gitignore` excludes sensitive patterns
|
||||||
|
- Pre-commit hooks scan for potential secrets (TruffleHog)
|
||||||
|
- All config files use references to secrets, not values
|
||||||
|
|
||||||
|
## Access Control Philosophy
|
||||||
|
|
||||||
|
### Principle of Least Privilege
|
||||||
|
|
||||||
|
Services and devices get minimum necessary access:
|
||||||
|
|
||||||
|
| Entity | Access |
|
||||||
|
|--------|--------|
|
||||||
|
| Admin users | Everything |
|
||||||
|
| Member users | User-facing services only |
|
||||||
|
| Homelab servers | Only what they need (NAS for backups) |
|
||||||
|
| K8s pods | No Tailscale access (use Caddy proxy) |
|
||||||
|
|
||||||
|
### Tagged Devices vs User Devices
|
||||||
|
|
||||||
|
Important Tailscale concept:
|
||||||
|
- **User devices** (like gilbert) have user identity and inherit user ACLs
|
||||||
|
- **Tagged devices** (like indri with `tag:homelab`) lose user identity
|
||||||
|
|
||||||
|
Don't tag user devices - it breaks user-based access rules.
|
||||||
|
|
||||||
|
## Authentication Patterns
|
||||||
|
|
||||||
|
### Service-to-Service
|
||||||
|
|
||||||
|
Internal services use:
|
||||||
|
- Kubernetes service discovery (no auth needed within cluster)
|
||||||
|
- Tailscale identity for cross-host communication
|
||||||
|
|
||||||
|
### User-to-Service
|
||||||
|
|
||||||
|
Users authenticate via:
|
||||||
|
- Service-specific credentials (stored in 1Password)
|
||||||
|
- Some services support Tailscale identity (future)
|
||||||
|
|
||||||
|
### AI/Automation Access
|
||||||
|
|
||||||
|
Claude Code and automation use:
|
||||||
|
- SSH keys for git operations
|
||||||
|
- ArgoCD tokens for deployments
|
||||||
|
- 1Password CLI for secret retrieval (requires user approval)
|
||||||
|
|
||||||
|
## What's Not Protected
|
||||||
|
|
||||||
|
Honest assessment of security boundaries:
|
||||||
|
|
||||||
|
- **Local network attacks** - If someone is on your home WiFi, they could potentially access the NAS directly
|
||||||
|
- **Physical access** - No disk encryption on servers (trade-off for reliability)
|
||||||
|
- **Supply chain** - Container images from upstream registries
|
||||||
|
- **Operator error** - Misconfigured ACLs or leaked credentials
|
||||||
|
|
||||||
|
The model assumes a trusted home network and focuses on protecting against internet-based attacks.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [[tailscale]] - ACL configuration
|
||||||
|
- [[1password]] - Secrets management
|
||||||
|
- [[external-secrets]] - Kubernetes secrets
|
||||||
|
- [[architecture]] - Overall system design
|
||||||
70
docs/explanation/why-gitops.md
Normal file
70
docs/explanation/why-gitops.md
Normal file
|
|
@ -0,0 +1,70 @@
|
||||||
|
---
|
||||||
|
title: why-gitops
|
||||||
|
tags:
|
||||||
|
- explanation
|
||||||
|
- philosophy
|
||||||
|
---
|
||||||
|
|
||||||
|
# Why GitOps?
|
||||||
|
|
||||||
|
> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
|
||||||
|
|
||||||
|
BlumeOps uses GitOps principles for managing personal infrastructure. This might seem like overkill for a homelab, but there are good reasons.
|
||||||
|
|
||||||
|
## The Problem with Manual Infrastructure
|
||||||
|
|
||||||
|
Traditional server management involves SSHing into machines and running commands. This works, but creates problems:
|
||||||
|
|
||||||
|
- **Drift**: The actual state diverges from what you think it is
|
||||||
|
- **Amnesia**: You forget what you changed and why
|
||||||
|
- **Fragility**: One bad command can break things with no easy rollback
|
||||||
|
- **Bus factor**: Only you know how it works (even AI assistants struggle without context)
|
||||||
|
|
||||||
|
## Git as the Source of Truth
|
||||||
|
|
||||||
|
GitOps inverts the model: instead of pushing changes to servers, you commit desired state to Git, and automation pulls it into reality.
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- Every change is tracked with commit history
|
||||||
|
- Pull requests enable review before deployment
|
||||||
|
- Rollback is just `git revert`
|
||||||
|
- The repo *is* the documentation
|
||||||
|
|
||||||
|
## Why This Matters for a Homelab
|
||||||
|
|
||||||
|
A personal homelab isn't a production environment, but it shares the same challenges:
|
||||||
|
|
||||||
|
1. **Memory is unreliable** - Six months from now, you won't remember why you configured Caddy that way
|
||||||
|
2. **Experimentation is constant** - You try things, break things, want to undo things
|
||||||
|
3. **AI assistance needs context** - Claude can help much more effectively when it can read your infrastructure as code
|
||||||
|
|
||||||
|
## The BlumeOps Approach
|
||||||
|
|
||||||
|
BlumeOps uses layered GitOps:
|
||||||
|
|
||||||
|
| Layer | Tool | What it manages |
|
||||||
|
|-------|------|-----------------|
|
||||||
|
| **Tailnet** | [[reference/infrastructure/tailscale|Pulumi]] | ACLs, tags, DNS |
|
||||||
|
| **Host config** | [[reference/ansible/roles|Ansible]] | Services on [[indri]] |
|
||||||
|
| **Kubernetes** | [[argocd|ArgoCD]] | Containerized workloads |
|
||||||
|
|
||||||
|
Each layer has its own reconciliation loop:
|
||||||
|
- Pulumi applies on `mise run tailnet-up`
|
||||||
|
- Ansible applies on `mise run provision-indri`
|
||||||
|
- ArgoCD watches Git and syncs manually or automatically
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
GitOps isn't free:
|
||||||
|
|
||||||
|
- **Learning curve** - You need to understand Ansible, ArgoCD, Pulumi
|
||||||
|
- **Indirection** - Can't just `apt install` something; need to add it to config
|
||||||
|
- **Complexity** - More moving parts than a simple server
|
||||||
|
|
||||||
|
But for BlumeOps, the trade-off is worth it. The infrastructure is complex enough that managing it imperatively would be error-prone, and the GitOps approach enables effective AI-assisted operations.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [[architecture]] - How the pieces fit together
|
||||||
|
- [[argocd]] - Kubernetes GitOps
|
||||||
|
- [[reference/ansible/roles|Ansible roles]] - Host configuration
|
||||||
|
|
@ -9,7 +9,9 @@ Welcome to the BlumeOps documentation.
|
||||||
## Sections
|
## Sections
|
||||||
|
|
||||||
- [[tutorials/index | Tutorials]] - Learning-oriented guides for getting started
|
- [[tutorials/index | Tutorials]] - Learning-oriented guides for getting started
|
||||||
- [[reference/index | Reference]] - Technical reference cards for services, infrastructure, and operations
|
- [[reference/index | Reference]] - Technical specifications and service details
|
||||||
|
- [[how-to/index | How-to]] - Task-oriented instructions for common operations
|
||||||
|
- [[explanation/index | Explanation]] - Understanding the "why" behind BlumeOps
|
||||||
|
|
||||||
## About
|
## About
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -20,7 +20,7 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework:
|
||||||
| **[[tutorials/index | Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
|
| **[[tutorials/index | Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
|
||||||
| **[[reference/index | Reference]]** | Information-oriented | "I need specific technical details" |
|
| **[[reference/index | Reference]]** | Information-oriented | "I need specific technical details" |
|
||||||
| **[[how-to/index | How-to]]** | Task-oriented | "I need to do X" |
|
| **[[how-to/index | How-to]]** | Task-oriented | "I need to do X" |
|
||||||
| **Explanation** (planned) | Understanding-oriented | "I want to understand why" |
|
| **[[explanation/index | Explanation]]** | Understanding-oriented | "I want to understand why" |
|
||||||
|
|
||||||
## Quick Paths by Audience
|
## Quick Paths by Audience
|
||||||
|
|
||||||
|
|
@ -42,9 +42,9 @@ Context for effective assistance:
|
||||||
### For External Readers
|
### For External Readers
|
||||||
|
|
||||||
Understanding what this is:
|
Understanding what this is:
|
||||||
|
- [[explanation/index|Explanation]] covers the "why" behind design decisions
|
||||||
- [[reference/index|Reference]] shows what's actually running
|
- [[reference/index|Reference]] shows what's actually running
|
||||||
- Browse service pages to see specific implementations
|
- Browse service pages to see specific implementations
|
||||||
- The repo's README has project context
|
|
||||||
|
|
||||||
### For Contributors
|
### For Contributors
|
||||||
|
|
||||||
|
|
@ -58,6 +58,7 @@ Getting started with changes:
|
||||||
Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.
|
Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.
|
||||||
|
|
||||||
- [[replicating-blumeops]] provides the overview
|
- [[replicating-blumeops]] provides the overview
|
||||||
|
- [[explanation/index|Explanation]] covers architecture and design rationale
|
||||||
- The `replication/` tutorials go deep on components
|
- The `replication/` tutorials go deep on components
|
||||||
- Reference pages show specific configuration choices
|
- Reference pages show specific configuration choices
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue