Add Phase 5: explanation documentation (#96)

## Summary - Create `docs/explanation/` directory with index and three explanation articles - why-gitops: Philosophy of GitOps for homelabs (memory, rollback, AI context) - architecture: How pieces fit together (ASCII diagrams of hosts, data flow, secrets) - security-model: Tailscale zero-trust, 1Password secrets, access control philosophy - Update docs/index.md with How-to and Explanation section links - Update exploring-the-docs to link Explanation section Decision log deferred to future work. ## Deployment and Testing - [x] Pre-commit hooks pass (including doc-links validator) - [ ] Build and deploy to docs.ops.eblu.me to verify rendering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/96
2026-02-03 20:33:39 -08:00 · 2026-02-03 20:33:39 -08:00 · 0a28622751
commit 0a28622751
parent e426473c59
8 changed files with 396 additions and 10 deletions
--- a/docs/explanation/architecture.md
+++ b/docs/explanation/architecture.md
@ -0,0 +1,149 @@
+---
+title: architecture
+tags:
+  - explanation
+  - architecture
+---
+
+# Architecture Overview
+
+> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
+
+How all the BlumeOps pieces fit together.
+
+## Physical Layer
+
+Two always-on devices form the infrastructure backbone:
+
+```
+┌─────────────────┐     ┌─────────────────┐
+│     Indri       │     │     Sifaka      │
+│  Mac Mini M1    │────▶│  Synology NAS   │
+│  (compute)      │     │  (storage)      │
+└─────────────────┘     └─────────────────┘
+        │
+        │ Tailscale
+        ▼
+┌─────────────────┐
+│    Gilbert      │
+│  MacBook Air    │
+│  (workstation)  │
+└─────────────────┘
+```
+
+- **[[indri]]** runs all services (native and containerized)
+- **[[sifaka]]** provides bulk storage and backup targets
+- **[[gilbert]]** is the development workstation
+
+## Network Layer
+
+[[tailscale]] provides the network fabric:
+
+- All devices on tailnet `tail8d86e.ts.net`
+- ACLs control access between devices and services
+- MagicDNS provides `*.tail8d86e.ts.net` hostnames
+- No port forwarding or public IPs needed
+
+## Service Routing
+
+Two DNS domains route to services:
+
+| Domain | Mechanism | Reachable from |
+|--------|-----------|----------------|
+| `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) |
+| `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only |
+
+See [[routing]] for details on when to use which.
+
+## Compute Layer
+
+Services run in two places:
+
+### Native on Indri (Ansible)
+
+Some services run directly on macOS:
+- [[forgejo]] - Git forge (needs filesystem access)
+- [[zot]] - Container registry (k8s depends on it)
+- [[jellyfin]] - Media server (needs VideoToolbox hardware transcoding)
+- [[borgmatic]] - Backups (needs host filesystem access)
+
+Managed via Ansible roles in `ansible/roles/`.
+
+### Kubernetes (ArgoCD)
+
+Most services run in minikube on indri:
+- [[grafana]], [[prometheus]], [[loki]] - Observability
+- [[miniflux]], [[navidrome]], [[kiwix]] - Applications
+- [[postgresql]] - Shared database (CloudNativePG)
+
+Managed via ArgoCD from `argocd/manifests/`.
+
+## Data Flow
+
+```
+┌──────────────┐
+│   Git Repo   │
+│  (Forgejo)   │
+└──────┬───────┘
+       │ push
+       ▼
+┌──────────────┐     ┌──────────────┐
+│   ArgoCD     │────▶│  Kubernetes  │
+│  (watches)   │sync │   (runs)     │
+└──────────────┘     └──────────────┘
+                            │
+       ┌────────────────────┼────────────────────┐
+       ▼                    ▼                    ▼
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Service    │     │   Service    │     │   Service    │
+└──────────────┘     └──────────────┘     └──────────────┘
+```
+
+1. Code pushed to [[forgejo]]
+2. [[argocd]] detects changes (or manual sync triggered)
+3. ArgoCD applies manifests to cluster
+4. Services start/update in Kubernetes
+
+## Observability
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│   Alloy     │────▶│ Prometheus  │────▶│   Grafana   │
+│ (collector) │     │  (metrics)  │     │ (dashboards)│
+└─────────────┘     └─────────────┘     └─────────────┘
+       │                                       ▲
+       │            ┌─────────────┐            │
+       └───────────▶│    Loki     │────────────┘
+                    │   (logs)    │
+                    └─────────────┘
+```
+
+[[alloy]] runs in two places:
+- On indri: collects host metrics and logs
+- In k8s: collects pod logs and service probes
+
+See [[observability]] for details.
+
+## Secrets Flow
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  1Password  │────▶│  1Password  │────▶│   External  │
+│   (vault)   │     │   Connect   │     │   Secrets   │
+└─────────────┘     └─────────────┘     └─────────────┘
+                                               │
+                                               ▼
+                                        ┌─────────────┐
+                                        │  K8s Secret │
+                                        └─────────────┘
+```
+
+Secrets live in 1Password and flow to Kubernetes via [[external-secrets]].
+
+For Ansible, secrets are fetched via `op` CLI in playbook pre_tasks.
+
+## Related
+
+- [[why-gitops]] - Philosophy behind this approach
+- [[security-model]] - Access control and secrets
+- [[routing]] - Service routing details
--- a/docs/explanation/index.md
+++ b/docs/explanation/index.md
@ -0,0 +1,22 @@
+---
+title: explanation
+tags:
+  - explanation
+---
+
+# Explanation
+
+Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
+
+## Philosophy
+
+| Article | Description |
+|---------|-------------|
+| [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
+
+## Design
+
+| Article | Description |
+|---------|-------------|
+| [[architecture]] | How all the pieces fit together |
+| [[security-model]] | Network security, secrets, and access control |
--- a/docs/explanation/security-model.md
+++ b/docs/explanation/security-model.md
@ -0,0 +1,139 @@
+---
+title: security-model
+tags:
+  - explanation
+  - security
+---
+
+# Security Model
+
+> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
+
+How BlumeOps handles network security, secrets, and access control.
+
+## Network Security: Tailscale
+
+The foundational security decision is using [[tailscale]] as the network layer.
+
+### Zero Trust Networking
+
+BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale:
+
+- **No attack surface** from the public internet
+- **Encrypted by default** - WireGuard encryption for all traffic
+- **Identity-based access** - ACLs based on user/device identity, not IP addresses
+
+### Defense in Depth
+
+Even within the tailnet, access is restricted:
+
+```
+Internet ──X──▶ Services (no public access)
+
+Tailnet:
+  Admin ────────▶ All services
+  Member ───────▶ User-facing services only
+  Homelab tag ──▶ NAS (for backups)
+```
+
+See [[tailscale]] for the full ACL matrix.
+
+## Secrets Management
+
+Secrets follow a hierarchy:
+
+### Source of Truth: 1Password
+
+All secrets originate in 1Password's `blumeops` vault:
+- API keys, tokens, passwords
+- SSH keys and certificates
+- OAuth credentials
+
+### Kubernetes: External Secrets Operator
+
+[[external-secrets]] syncs secrets from 1Password to Kubernetes:
+
+```
+1Password ──▶ 1Password Connect ──▶ ExternalSecret ──▶ K8s Secret
+```
+
+Services reference native Kubernetes Secrets; they don't know about 1Password.
+
+### Ansible: op CLI
+
+Ansible playbooks fetch secrets at runtime via `op` CLI:
+
+```yaml
+- name: Fetch secret
+  command: op item get <id> --fields password --reveal
+  delegate_to: localhost
+```
+
+Secrets are held in memory as Ansible facts, never written to disk.
+
+### Git Repository
+
+The repository is public. Secrets must never be committed:
+- `.gitignore` excludes sensitive patterns
+- Pre-commit hooks scan for potential secrets (TruffleHog)
+- All config files use references to secrets, not values
+
+## Access Control Philosophy
+
+### Principle of Least Privilege
+
+Services and devices get minimum necessary access:
+
+| Entity | Access |
+|--------|--------|
+| Admin users | Everything |
+| Member users | User-facing services only |
+| Homelab servers | Only what they need (NAS for backups) |
+| K8s pods | No Tailscale access (use Caddy proxy) |
+
+### Tagged Devices vs User Devices
+
+Important Tailscale concept:
+- **User devices** (like gilbert) have user identity and inherit user ACLs
+- **Tagged devices** (like indri with `tag:homelab`) lose user identity
+
+Don't tag user devices - it breaks user-based access rules.
+
+## Authentication Patterns
+
+### Service-to-Service
+
+Internal services use:
+- Kubernetes service discovery (no auth needed within cluster)
+- Tailscale identity for cross-host communication
+
+### User-to-Service
+
+Users authenticate via:
+- Service-specific credentials (stored in 1Password)
+- Some services support Tailscale identity (future)
+
+### AI/Automation Access
+
+Claude Code and automation use:
+- SSH keys for git operations
+- ArgoCD tokens for deployments
+- 1Password CLI for secret retrieval (requires user approval)
+
+## What's Not Protected
+
+Honest assessment of security boundaries:
+
+- **Local network attacks** - If someone is on your home WiFi, they could potentially access the NAS directly
+- **Physical access** - No disk encryption on servers (trade-off for reliability)
+- **Supply chain** - Container images from upstream registries
+- **Operator error** - Misconfigured ACLs or leaked credentials
+
+The model assumes a trusted home network and focuses on protecting against internet-based attacks.
+
+## Related
+
+- [[tailscale]] - ACL configuration
+- [[1password]] - Secrets management
+- [[external-secrets]] - Kubernetes secrets
+- [[architecture]] - Overall system design
--- a/docs/explanation/why-gitops.md
+++ b/docs/explanation/why-gitops.md
@ -0,0 +1,70 @@
+---
+title: why-gitops
+tags:
+  - explanation
+  - philosophy
+---
+
+# Why GitOps?
+
+> **Note:** This article was drafted by AI and reviewed by Erich. I plan to rewrite all explanatory content in my own words - these serve as placeholders to establish the documentation structure.
+
+BlumeOps uses GitOps principles for managing personal infrastructure. This might seem like overkill for a homelab, but there are good reasons.
+
+## The Problem with Manual Infrastructure
+
+Traditional server management involves SSHing into machines and running commands. This works, but creates problems:
+
+- **Drift**: The actual state diverges from what you think it is
+- **Amnesia**: You forget what you changed and why
+- **Fragility**: One bad command can break things with no easy rollback
+- **Bus factor**: Only you know how it works (even AI assistants struggle without context)
+
+## Git as the Source of Truth
+
+GitOps inverts the model: instead of pushing changes to servers, you commit desired state to Git, and automation pulls it into reality.
+
+**Benefits:**
+- Every change is tracked with commit history
+- Pull requests enable review before deployment
+- Rollback is just `git revert`
+- The repo *is* the documentation
+
+## Why This Matters for a Homelab
+
+A personal homelab isn't a production environment, but it shares the same challenges:
+
+1. **Memory is unreliable** - Six months from now, you won't remember why you configured Caddy that way
+2. **Experimentation is constant** - You try things, break things, want to undo things
+3. **AI assistance needs context** - Claude can help much more effectively when it can read your infrastructure as code
+
+## The BlumeOps Approach
+
+BlumeOps uses layered GitOps:
+
+| Layer | Tool | What it manages |
+|-------|------|-----------------|
+| **Tailnet** | [[reference/infrastructure/tailscale|Pulumi]] | ACLs, tags, DNS |
+| **Host config** | [[reference/ansible/roles|Ansible]] | Services on [[indri]] |
+| **Kubernetes** | [[argocd|ArgoCD]] | Containerized workloads |
+
+Each layer has its own reconciliation loop:
+- Pulumi applies on `mise run tailnet-up`
+- Ansible applies on `mise run provision-indri`
+- ArgoCD watches Git and syncs manually or automatically
+
+## Trade-offs
+
+GitOps isn't free:
+
+- **Learning curve** - You need to understand Ansible, ArgoCD, Pulumi
+- **Indirection** - Can't just `apt install` something; need to add it to config
+- **Complexity** - More moving parts than a simple server
+
+But for BlumeOps, the trade-off is worth it. The infrastructure is complex enough that managing it imperatively would be error-prone, and the GitOps approach enables effective AI-assisted operations.
+
+## Related
+
+- [[architecture]] - How the pieces fit together
+- [[argocd]] - Kubernetes GitOps
+- [[reference/ansible/roles|Ansible roles]] - Host configuration