Add Phase 5: explanation documentation

Understanding-oriented content explaining the "why" behind BlumeOps: - why-gitops: Philosophy of infrastructure-as-code for homelabs - architecture: How all the pieces fit together (hosts, services, data flow) - security-model: Tailscale networking, 1Password secrets, access control Also updates docs/index.md with How-to and Explanation sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:23:45 -08:00 · 2026-02-03 20:23:45 -08:00 · 7be34dca60
commit 7be34dca60
parent e426473c59
8 changed files with 390 additions and 10 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -111,15 +111,17 @@ Task-oriented instructions for specific operations.

 **How-to URL:** https://docs.ops.eblu.me/how-to/

-### Phase 5: Explanation
+### Phase 5: Explanation (Complete)
 Understanding-oriented discussion of concepts and decisions.

- [ ] Create `explanation/` directory
- [ ] "Why GitOps?" - Philosophy and approach
- [ ] "Architecture Overview" - How everything fits together
- [ ] "Security Model" - Tailscale, secrets management, etc.
- [ ] "Decision Log" - ADRs (Architecture Decision Records)
- [ ] Update `exploring-the-docs` with Explanation section
+- [x] Create `explanation/` directory
+- [x] "Why GitOps?" - Philosophy and approach
+- [x] "Architecture Overview" - How everything fits together
+- [x] "Security Model" - Tailscale, secrets management, etc.
+- [ ] "Decision Log" - ADRs (Architecture Decision Records) - deferred
+- [x] Update `exploring-the-docs` with Explanation section
+
+**Explanation URL:** https://docs.ops.eblu.me/explanation/

 ### Phase 6: Integration & Cleanup
 - [ ] Migrate remaining useful content from `docs/zk/`
--- a/docs/changelog.d/phase5-explanation.doc.md
+++ b/docs/changelog.d/phase5-explanation.doc.md
@ -0,0 +1 @@
+Add Phase 5 explanation docs: why GitOps, architecture overview, and security model
--- a/docs/explanation/architecture.md
+++ b/docs/explanation/architecture.md
@ -0,0 +1,147 @@
+---
+title: architecture
+tags:
+  - explanation
+  - architecture
+---
+
+# Architecture Overview
+
+How all the BlumeOps pieces fit together.
+
+## Physical Layer
+
+Two always-on devices form the infrastructure backbone:
+
+```
+┌─────────────────┐     ┌─────────────────┐
+│     Indri       │     │     Sifaka      │
+│  Mac Mini M1    │────▶│  Synology NAS   │
+│  (compute)      │     │  (storage)      │
+└─────────────────┘     └─────────────────┘
+        │
+        │ Tailscale
+        ▼
+┌─────────────────┐
+│    Gilbert      │
+│  MacBook Air    │
+│  (workstation)  │
+└─────────────────┘
+```
+
+- **[[indri]]** runs all services (native and containerized)
+- **[[sifaka]]** provides bulk storage and backup targets
+- **[[gilbert]]** is the development workstation
+
+## Network Layer
+
+[[tailscale]] provides the network fabric:
+
+- All devices on tailnet `tail8d86e.ts.net`
+- ACLs control access between devices and services
+- MagicDNS provides `*.tail8d86e.ts.net` hostnames
+- No port forwarding or public IPs needed
+
+## Service Routing
+
+Two DNS domains route to services:
+
+| Domain | Mechanism | Reachable from |
+|--------|-----------|----------------|
+| `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) |
+| `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
+
+See [[routing]] for details on when to use which.
+
+## Compute Layer
+
+Services run in two places:
+
+### Native on Indri (Ansible)
+
+Some services run directly on macOS:
+- [[forgejo]] - Git forge (needs filesystem access)
+- [[zot]] - Container registry (k8s depends on it)
+- [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
+- [[borgmatic]] - Backups (needs host filesystem access)
+
+Managed via Ansible roles in `ansible/roles/`.
+
+### Kubernetes (ArgoCD)
+
+Most services run in minikube on indri:
+- [[grafana]], [[prometheus]], [[loki]] - Observability
+- [[miniflux]], [[navidrome]], [[kiwix]] - Applications
+- [[postgresql]] - Shared database (CloudNativePG)
+
+Managed via ArgoCD from `argocd/manifests/`.
+
+## Data Flow
+
+```
+┌──────────────┐
+│   Git Repo   │
+│  (Forgejo)   │
+└──────┬───────┘
+       │ push
+       ▼
+┌──────────────┐     ┌──────────────┐
+│   ArgoCD     │────▶│  Kubernetes  │
+│  (watches)   │sync │   (runs)     │
+└──────────────┘     └──────────────┘
+                            │
+       ┌────────────────────┼────────────────────┐
+       ▼                    ▼                    ▼
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Service    │     │   Service    │     │   Service    │
+└──────────────┘     └──────────────┘     └──────────────┘
+```
+
+1. Code pushed to [[forgejo]]
+2. [[argocd]] detects changes (or manual sync triggered)
+3. ArgoCD applies manifests to cluster
+4. Services start/update in Kubernetes
+
+## Observability
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│   Alloy     │────▶│ Prometheus  │────▶│   Grafana   │
+│ (collector) │     │  (metrics)  │     │ (dashboards)│
+└─────────────┘     └─────────────┘     └─────────────┘
+       │                                       ▲
+       │            ┌─────────────┐            │
+       └───────────▶│    Loki     │────────────┘
+                    │   (logs)    │
+                    └─────────────┘
+```
+
+[[alloy]] runs in two places:
+- On indri: collects host metrics and logs
+- In k8s: collects pod logs and service probes
+
+See [[observability]] for details.
+
+## Secrets Flow
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  1Password  │────▶│  1Password  │────▶│   External  │
+│   (vault)   │     │   Connect   │     │   Secrets   │
+└─────────────┘     └─────────────┘     └─────────────┘
+                                               │
+                                               ▼
+                                        ┌─────────────┐
+                                        │  K8s Secret │
+                                        └─────────────┘
+```
+
+Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
+
+For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
+
+## Related
+
+- [[why-gitops]] - Philosophy behind this approach
+- [[security-model]] - Access control and secrets
+- [[routing]] - Service routing details
--- a/docs/explanation/index.md
+++ b/docs/explanation/index.md
@ -0,0 +1,22 @@
+---
+title: explanation
+tags:
+  - explanation
+---
+
+# Explanation
+
+Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
+
+## Philosophy
+
+| Article | Description |
+|---------|-------------|
+| [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
+
+## Design
+
+| Article | Description |
+|---------|-------------|
+| [[architecture]] | How all the pieces fit together |
+| [[security-model]] | Network security, secrets, and access control |
--- a/docs/explanation/security-model.md
+++ b/docs/explanation/security-model.md
@ -0,0 +1,137 @@
+---
+title: security-model
+tags:
+  - explanation
+  - security
+---
+
+# Security Model
+
+How BlumeOps handles network security, secrets, and access control.
+
+## Network Security: Tailscale
+
+The foundational security decision is using [[tailscale]] as the network layer.
+
+### Zero Trust Networking
+
+BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale:
+
+- **No attack surface** from the public internet
+- **Encrypted by default** - WireGuard encryption for all traffic
+- **Identity-based access** - ACLs based on user/device identity, not IP addresses
+
+### Defense in Depth
+
+Even within the tailnet, access is restricted:
+
+```
+Internet ──X──▶ Services (no public access)
+
+Tailnet:
+  Admin ────────▶ All services
+  Member ───────▶ User-facing services only
+  Homelab tag ──▶ NAS (for backups)
+```
+
+See [[tailscale]] for the full ACL matrix.
+
+## Secrets Management
+
+Secrets follow a hierarchy:
+
+### Source of Truth: 1Password
+
+All secrets originate in 1Password's `blumeops` vault:
+- API keys, tokens, passwords
+- SSH keys and certificates
+- OAuth credentials
+
+### Kubernetes: External Secrets Operator
+
+[[external-secrets]] syncs secrets from 1Password to Kubernetes:
+
+```
+1Password ──▶ 1Password Connect ──▶ ExternalSecret ──▶ K8s Secret
+```
+
+Services reference native Kubernetes Secrets; they don't know about 1Password.
+
+### Ansible: op CLI
+
+Ansible playbooks fetch secrets at runtime via `op` CLI:
+
+```yaml
+- name: Fetch secret
+  command: op item get <id> --fields password --reveal
+  delegate_to: localhost
+```
+
+Secrets are held in memory as Ansible facts, never written to disk.
+
+### Git Repository
+
+The repository is public. Secrets must never be committed:
+- `.gitignore` excludes sensitive patterns
+- Pre-commit hooks scan for potential secrets (TruffleHog)
+- All config files use references to secrets, not values
+
+## Access Control Philosophy
+
+### Principle of Least Privilege
+
+Services and devices get minimum necessary access:
+
+| Entity | Access |
+|--------|--------|
+| Admin users | Everything |
+| Member users | User-facing services only |
+| Homelab servers | Only what they need (NAS for backups) |
+| K8s pods | No Tailscale access (use Caddy proxy) |
+
+### Tagged Devices vs User Devices
+
+Important Tailscale concept:
+- **User devices** (like gilbert) have user identity and inherit user ACLs
+- **Tagged devices** (like indri with `tag:homelab`) lose user identity
+
+Don't tag user devices - it breaks user-based access rules.
+
+## Authentication Patterns
+
+### Service-to-Service
+
+Internal services use:
+- Kubernetes service discovery (no auth needed within cluster)
+- Tailscale identity for cross-host communication
+
+### User-to-Service
+
+Users authenticate via:
+- Service-specific credentials (stored in 1Password)
+- Some services support Tailscale identity (future)
+
+### AI/Automation Access
+
+Claude Code and automation use:
+- SSH keys for git operations
+- ArgoCD tokens for deployments
+- 1Password CLI for secret retrieval (requires user approval)
+
+## What's Not Protected
+
+Honest assessment of security boundaries:
+
+- **Local network attacks** - If someone is on your home WiFi, they could potentially access the NAS directly
+- **Physical access** - No disk encryption on servers (trade-off for reliability)
+- **Supply chain** - Container images from upstream registries
+- **Operator error** - Misconfigured ACLs or leaked credentials
+
+The model assumes a trusted home network and focuses on protecting against internet-based attacks.
+
+## Related
+
+- [[tailscale]] - ACL configuration
+- [[1password]] - Secrets management
+- [[external-secrets]] - Kubernetes secrets
+- [[architecture]] - Overall system design
--- a/docs/explanation/why-gitops.md
+++ b/docs/explanation/why-gitops.md
@ -0,0 +1,68 @@
+---
+title: why-gitops
+tags:
+  - explanation
+  - philosophy
+---
+
+# Why GitOps?
+
+BlumeOps uses GitOps principles for managing personal infrastructure. This might seem like overkill for a homelab, but there are good reasons.
+
+## The Problem with Manual Infrastructure
+
+Traditional server management involves SSHing into machines and running commands. This works, but creates problems:
+
+- **Drift**: The actual state diverges from what you think it is
+- **Amnesia**: You forget what you changed and why
+- **Fragility**: One bad command can break things with no easy rollback
+- **Bus factor**: Only you know how it works (even AI assistants struggle without context)
+
+## Git as the Source of Truth
+
+GitOps inverts the model: instead of pushing changes to servers, you commit desired state to Git, and automation pulls it into reality.
+
+**Benefits:**
+- Every change is tracked with commit history
+- Pull requests enable review before deployment
+- Rollback is just `git revert`
+- The repo *is* the documentation
+
+## Why This Matters for a Homelab
+
+A personal homelab isn't a production environment, but it shares the same challenges:
+
+1. **Memory is unreliable** - Six months from now, you won't remember why you configured Caddy that way
+2. **Experimentation is constant** - You try things, break things, want to undo things
+3. **AI assistance needs context** - Claude can help much more effectively when it can read your infrastructure as code
+
+## The BlumeOps Approach
+
+BlumeOps uses layered GitOps:
+
+| Layer | Tool | What it manages |
+|-------|------|-----------------|
+| **Tailnet** | [[reference/infrastructure/tailscale|Pulumi]] | ACLs, tags, DNS |
+| **Host config** | [[reference/ansible/roles|Ansible]] | Services on [[indri]] |
+| **Kubernetes** | [[argocd|ArgoCD]] | Containerized workloads |
+
+Each layer has its own reconciliation loop:
+- Pulumi applies on `mise run tailnet-up`
+- Ansible applies on `mise run provision-indri`
+- ArgoCD watches Git and syncs manually or automatically
+
+## Trade-offs
+
+GitOps isn't free:
+
+- **Learning curve** - You need to understand Ansible, ArgoCD, Pulumi
+- **Indirection** - Can't just `apt install` something; need to add it to config
+- **Complexity** - More moving parts than a simple server
+
+But for BlumeOps, the trade-off is worth it. The infrastructure is complex enough that managing it imperatively would be error-prone, and the GitOps approach enables effective AI-assisted operations.
+
+## Related
+
+- [[architecture]] - How the pieces fit together
+- [[argocd]] - Kubernetes GitOps
+- [[reference/ansible/roles|Ansible roles]] - Host configuration
--- a/docs/index.md
+++ b/docs/index.md
@ -9,7 +9,9 @@ Welcome to the BlumeOps documentation.
 ## Sections

 - [[tutorials/index | Tutorials]] - Learning-oriented guides for getting started
- [[reference/index | Reference]] - Technical reference cards for services, infrastructure, and operations
+- [[reference/index | Reference]] - Technical specifications and service details
+- [[how-to/index | How-to]] - Task-oriented instructions for common operations
+- [[explanation/index | Explanation]] - Understanding the "why" behind BlumeOps

 ## About

--- a/docs/tutorials/exploring-the-docs.md
+++ b/docs/tutorials/exploring-the-docs.md
@ -20,7 +20,7 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework:
 | **[[tutorials/index | Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
 | **[[reference/index | Reference]]** | Information-oriented | "I need specific technical details" |
 | **[[how-to/index | How-to]]** | Task-oriented | "I need to do X" |
-| **Explanation** (planned) | Understanding-oriented | "I want to understand why" |
+| **[[explanation/index | Explanation]]** | Understanding-oriented | "I want to understand why" |

 ## Quick Paths by Audience

@ -42,9 +42,9 @@ Context for effective assistance:
 ### For External Readers

 Understanding what this is:
+- [[explanation/index|Explanation]] covers the "why" behind design decisions
 - [[reference/index|Reference]] shows what's actually running
 - Browse service pages to see specific implementations
- The repo's README has project context

 ### For Contributors

@ -58,6 +58,7 @@ Getting started with changes:
 Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.

 - [[replicating-blumeops]] provides the overview
+- [[explanation/index|Explanation]] covers architecture and design rationale
 - The `replication/` tutorials go deep on components
 - Reference pages show specific configuration choices
				`@ -0,0 +1 @@`
				`Add Phase 5 explanation docs: why GitOps, architecture overview, and security model`