Add Phase 5: explanation documentation (#96)

## Summary - Create `docs/explanation/` directory with index and three explanation articles - why-gitops: Philosophy of GitOps for homelabs (memory, rollback, AI context) - architecture: How pieces fit together (ASCII diagrams of hosts, data flow, secrets) - security-model: Tailscale zero-trust, 1Password secrets, access control philosophy - Update docs/index.md with How-to and Explanation section links - Update exploring-the-docs to link Explanation section Decision log deferred to future work. ## Deployment and Testing - [x] Pre-commit hooks pass (including doc-links validator) - [ ] Build and deploy to docs.ops.eblu.me to verify rendering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/96
2026-02-03 20:33:39 -08:00 · 2026-02-03 20:33:39 -08:00 · 0a28622751
commit 0a28622751
parent e426473c59
8 changed files with 396 additions and 10 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -111,15 +111,17 @@ Task-oriented instructions for specific operations.
 **How-to URL:** https://docs.ops.eblu.me/how-to/
-### Phase 5: Explanation
+### Phase 5: Explanation (Complete)
 Understanding-oriented discussion of concepts and decisions.
- [ ] Create `explanation/` directory
+- [x] Create `explanation/` directory
- [ ] "Why GitOps?" - Philosophy and approach
+- [x] "Why GitOps?" - Philosophy and approach
- [ ] "Architecture Overview" - How everything fits together
+- [x] "Architecture Overview" - How everything fits together
- [ ] "Security Model" - Tailscale, secrets management, etc.
+- [x] "Security Model" - Tailscale, secrets management, etc.
- [ ] "Decision Log" - ADRs (Architecture Decision Records)
+- [ ] "Decision Log" - ADRs (Architecture Decision Records) - deferred
- [ ] Update `exploring-the-docs` with Explanation section
+- [x] Update `exploring-the-docs` with Explanation section
 **Explanation URL:** https://docs.ops.eblu.me/explanation/
 ### Phase 6: Integration & Cleanup
 - [ ] Migrate remaining useful content from `docs/zk/`
--- a/docs/changelog.d/phase5-explanation.doc.md
+++ b/docs/changelog.d/phase5-explanation.doc.md
@ -0,0 +1 @@
 Add Phase 5 explanation docs: why GitOps, architecture overview, and security model
--- a/docs/explanation/architecture.md
+++ b/docs/explanation/architecture.md
@ -0,0 +1,149 @@
 ---
 title: architecture
 tags:
  - explanation
  - architecture
 ---
 # Architecture Overview
 > **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
 How all the BlumeOps pieces fit together.
 ## Physical Layer
 Two always-on devices form the infrastructure backbone:
 ```
 ┌─────────────────┐     ┌─────────────────┐
 │     Indri       │     │     Sifaka      │
 │  Mac Mini M1    │────▶│  Synology NAS   │
 │  (compute)      │     │  (storage)      │
 └─────────────────┘     └─────────────────┘
        │
        │ Tailscale
        ▼
 ┌─────────────────┐
 │    Gilbert      │
 │  MacBook Air    │
 │  (workstation)  │
 └─────────────────┘
 ```
 - **[[indri]]** runs all services (native and containerized)
 - **[[sifaka]]** provides bulk storage and backup targets
 - **[[gilbert]]** is the development workstation
 ## Network Layer
 [[tailscale]] provides the network fabric:
 - All devices on tailnet `tail8d86e.ts.net`
 - ACLs control access between devices and services
 - MagicDNS provides `*.tail8d86e.ts.net` hostnames
 - No port forwarding or public IPs needed
 ## Service Routing
 Two DNS domains route to services:
 | Domain | Mechanism | Reachable from |
 |--------|-----------|----------------|
 | `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) |
 | `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
 See [[routing]] for details on when to use which.
 ## Compute Layer
 Services run in two places:
 ### Native on Indri (Ansible)
 Some services run directly on macOS:
 - [[forgejo]] - Git forge (needs filesystem access)
 - [[zot]] - Container registry (k8s depends on it)
 - [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
 - [[borgmatic]] - Backups (needs host filesystem access)
 Managed via Ansible roles in `ansible/roles/`.
 ### Kubernetes (ArgoCD)
 Most services run in minikube on indri:
 - [[grafana]], [[prometheus]], [[loki]] - Observability
 - [[miniflux]], [[navidrome]], [[kiwix]] - Applications
 - [[postgresql]] - Shared database (CloudNativePG)
 Managed via ArgoCD from `argocd/manifests/`.
 ## Data Flow
 ```
 ┌──────────────┐
 │   Git Repo   │
 │  (Forgejo)   │
 └──────┬───────┘
       │ push
       ▼
 ┌──────────────┐     ┌──────────────┐
 │   ArgoCD     │────▶│  Kubernetes  │
 │  (watches)   │sync │   (runs)     │
 └──────────────┘     └──────────────┘
                            │
       ┌────────────────────┼────────────────────┐
       ▼                    ▼                    ▼
 ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
 │   Service    │     │   Service    │     │   Service    │
 └──────────────┘     └──────────────┘     └──────────────┘
 ```
 1. Code pushed to [[forgejo]]
 2. [[argocd]] detects changes (or manual sync triggered)
 3. ArgoCD applies manifests to cluster
 4. Services start/update in Kubernetes
 ## Observability
 ```
 ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
 │   Alloy     │────▶│ Prometheus  │────▶│   Grafana   │
 │ (collector) │     │  (metrics)  │     │ (dashboards)│
 └─────────────┘     └─────────────┘     └─────────────┘
       │                                       ▲
       │            ┌─────────────┐            │
       └───────────▶│    Loki     │────────────┘
                    │   (logs)    │
                    └─────────────┘
 ```
 [[alloy]] runs in two places:
 - On indri: collects host metrics and logs
 - In k8s: collects pod logs and service probes
 See [[observability]] for details.
 ## Secrets Flow
 ```
 ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
 │  1Password  │────▶│  1Password  │────▶│   External  │
 │   (vault)   │     │   Connect   │     │   Secrets   │
 └─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │  K8s Secret │
                                        └─────────────┘
 ```
 Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
 For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
 ## Related
 - [[why-gitops]] - Philosophy behind this approach
 - [[security-model]] - Access control and secrets
 - [[routing]] - Service routing details
--- a/docs/explanation/index.md
+++ b/docs/explanation/index.md
@ -0,0 +1,22 @@
 ---
 title: explanation
 tags:
  - explanation
 ---
 # Explanation
 Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
 ## Philosophy
 | Article | Description |
 |---------|-------------|
 | [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
 ## Design
 | Article | Description |
 |---------|-------------|
 | [[architecture]] | How all the pieces fit together |
 | [[security-model]] | Network security, secrets, and access control |
--- a/docs/explanation/security-model.md
+++ b/docs/explanation/security-model.md
@ -0,0 +1,139 @@
 ---
 title: security-model
 tags:
  - explanation
  - security
 ---
 # Security Model
 > **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
 How BlumeOps handles network security, secrets, and access control.
 ## Network Security: Tailscale
 The foundational security decision is using [[tailscale]] as the network layer.
 ### Zero Trust Networking
 BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale:
 - **No attack surface** from the public internet
 - **Encrypted by default** - WireGuard encryption for all traffic
 - **Identity-based access** - ACLs based on user/device identity, not IP addresses
 ### Defense in Depth
 Even within the tailnet, access is restricted:
 ```
 Internet ──X──▶ Services (no public access)
 Tailnet:
  Admin ────────▶ All services
  Member ───────▶ User-facing services only
  Homelab tag ──▶ NAS (for backups)
 ```
 See [[tailscale]] for the full ACL matrix.
 ## Secrets Management
 Secrets follow a hierarchy:
 ### Source of Truth: 1Password
 All secrets originate in 1Password's `blumeops` vault:
 - API keys, tokens, passwords
 - SSH keys and certificates
 - OAuth credentials
 ### Kubernetes: External Secrets Operator
 [[external-secrets]] syncs secrets from 1Password to Kubernetes:
 ```
 1Password ──▶ 1Password Connect ──▶ ExternalSecret ──▶ K8s Secret
 ```
 Services reference native Kubernetes Secrets; they don't know about 1Password.
 ### Ansible: op CLI
 Ansible playbooks fetch secrets at runtime via `op` CLI:
 ```yaml
 - name: Fetch secret
  command: op item get <id> --fields password --reveal
  delegate_to: localhost
 ```
 Secrets are held in memory as Ansible facts, never written to disk.
 ### Git Repository
 The repository is public. Secrets must never be committed:
 - `.gitignore` excludes sensitive patterns
 - Pre-commit hooks scan for potential secrets (TruffleHog)
 - All config files use references to secrets, not values
 ## Access Control Philosophy
 ### Principle of Least Privilege
 Services and devices get minimum necessary access:
 | Entity | Access |
 |--------|--------|
 | Admin users | Everything |
 | Member users | User-facing services only |
 | Homelab servers | Only what they need (NAS for backups) |
 | K8s pods | No Tailscale access (use Caddy proxy) |
 ### Tagged Devices vs User Devices
 Important Tailscale concept:
 - **User devices** (like gilbert) have user identity and inherit user ACLs
 - **Tagged devices** (like indri with `tag:homelab`) lose user identity
 Don't tag user devices - it breaks user-based access rules.
 ## Authentication Patterns
 ### Service-to-Service
 Internal services use:
 - Kubernetes service discovery (no auth needed within cluster)
 - Tailscale identity for cross-host communication
 ### User-to-Service
 Users authenticate via:
 - Service-specific credentials (stored in 1Password)
 - Some services support Tailscale identity (future)
 ### AI/Automation Access
 Claude Code and automation use:
 - SSH keys for git operations
 - ArgoCD tokens for deployments
 - 1Password CLI for secret retrieval (requires user approval)
 ## What's Not Protected
 Honest assessment of security boundaries:
 - **Local network attacks** - If someone is on your home WiFi, they could potentially access the NAS directly
 - **Physical access** - No disk encryption on servers (trade-off for reliability)
 - **Supply chain** - Container images from upstream registries
 - **Operator error** - Misconfigured ACLs or leaked credentials
 The model assumes a trusted home network and focuses on protecting against internet-based attacks.
 ## Related
 - [[tailscale]] - ACL configuration
 - [[1password]] - Secrets management
 - [[external-secrets]] - Kubernetes secrets
 - [[architecture]] - Overall system design
--- a/docs/explanation/why-gitops.md
+++ b/docs/explanation/why-gitops.md
@ -0,0 +1,70 @@
 ---
 title: why-gitops
 tags:
  - explanation
  - philosophy
 ---
 # Why GitOps?
 > **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
 BlumeOps uses GitOps principles for managing personal infrastructure. This might seem like overkill for a homelab, but there are good reasons.
 ## The Problem with Manual Infrastructure
 Traditional server management involves SSHing into machines and running commands. This works, but creates problems:
 - **Drift**: The actual state diverges from what you think it is
 - **Amnesia**: You forget what you changed and why
 - **Fragility**: One bad command can break things with no easy rollback
 - **Bus factor**: Only you know how it works (even AI assistants struggle without context)
 ## Git as the Source of Truth
 GitOps inverts the model: instead of pushing changes to servers, you commit desired state to Git, and automation pulls it into reality.
 **Benefits:**
 - Every change is tracked with commit history
 - Pull requests enable review before deployment
 - Rollback is just `git revert`
 - The repo *is* the documentation
 ## Why This Matters for a Homelab
 A personal homelab isn't a production environment, but it shares the same challenges:
 1. **Memory is unreliable** - Six months from now, you won't remember why you configured Caddy that way
 2. **Experimentation is constant** - You try things, break things, want to undo things
 3. **AI assistance needs context** - Claude can help much more effectively when it can read your infrastructure as code
 ## The BlumeOps Approach
 BlumeOps uses layered GitOps:
 | Layer | Tool | What it manages |
 |-------|------|-----------------|
 | **Tailnet** | [[reference/infrastructure/tailscale|Pulumi]] | ACLs, tags, DNS |
 | **Host config** | [[reference/ansible/roles|Ansible]] | Services on [[indri]] |
 | **Kubernetes** | [[argocd|ArgoCD]] | Containerized workloads |
 Each layer has its own reconciliation loop:
 - Pulumi applies on `mise run tailnet-up`
 - Ansible applies on `mise run provision-indri`
 - ArgoCD watches Git and syncs manually or automatically
 ## Trade-offs
 GitOps isn't free:
 - **Learning curve** - You need to understand Ansible, ArgoCD, Pulumi
 - **Indirection** - Can't just `apt install` something; need to add it to config
 - **Complexity** - More moving parts than a simple server
 But for BlumeOps, the trade-off is worth it. The infrastructure is complex enough that managing it imperatively would be error-prone, and the GitOps approach enables effective AI-assisted operations.
 ## Related
 - [[architecture]] - How the pieces fit together
 - [[argocd]] - Kubernetes GitOps
 - [[reference/ansible/roles|Ansible roles]] - Host configuration
--- a/docs/index.md
+++ b/docs/index.md
@ -9,7 +9,9 @@ Welcome to the BlumeOps documentation.
 ## Sections
 - [[tutorials/index | Tutorials]] - Learning-oriented guides for getting started
- [[reference/index | Reference]] - Technical reference cards for services, infrastructure, and operations
+- [[reference/index | Reference]] - Technical specifications and service details
 - [[how-to/index | How-to]] - Task-oriented instructions for common operations
 - [[explanation/index | Explanation]] - Understanding the "why" behind BlumeOps
 ## About
--- a/docs/tutorials/exploring-the-docs.md
+++ b/docs/tutorials/exploring-the-docs.md
@ -20,7 +20,7 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework:
 | **[[tutorials/index | Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
 | **[[reference/index | Reference]]** | Information-oriented | "I need specific technical details" |
 | **[[how-to/index | How-to]]** | Task-oriented | "I need to do X" |
-| **Explanation** (planned) | Understanding-oriented | "I want to understand why" |
+| **[[explanation/index | Explanation]]** | Understanding-oriented | "I want to understand why" |
 ## Quick Paths by Audience
@ -42,9 +42,9 @@ Context for effective assistance:
 ### For External Readers
 Understanding what this is:
 - [[explanation/index|Explanation]] covers the "why" behind design decisions
 - [[reference/index|Reference]] shows what's actually running
 - Browse service pages to see specific implementations
 - The repo's README has project context
 ### For Contributors
@ -58,6 +58,7 @@ Getting started with changes:
 Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.
 - [[replicating-blumeops]] provides the overview
 - [[explanation/index|Explanation]] covers architecture and design rationale
 - The `replication/` tutorials go deep on components
 - Reference pages show specific configuration choices
		`@ -0,0 +1 @@`
							`Add Phase 5 explanation docs: why GitOps, architecture overview, and security model`