Add Phase 3 tutorials with audience targeting (#94)

## Summary - Create tutorials directory structure with index page - Add 5 main tutorials targeting different audiences: - **what-is-blumeops** (Reader, AI) - High-level orientation - **exploring-the-docs** (All) - Navigation guide - **ai-assistance-guide** (AI, Owner) - Context for AI-assisted operations - **contributing** (Contributor) - First contribution workflow - **replicating-blumeops** (Replicator) - Overview for building similar setup - Add 4 replication sub-tutorials: - tailscale-setup, kubernetes-bootstrap, argocd-config, observability-stack - Update README.md to mark Phase 3 complete - Add changelog fragment Each tutorial explicitly identifies its target audiences and links to reference material rather than re-explaining concepts. ## Deployment and Testing - [x] All pre-commit hooks pass (doc-links validates wiki links) - [ ] Build docs via workflow to verify rendering - [ ] Review content for accuracy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/94
2026-02-03 18:51:57 -08:00 · 2026-02-03 18:51:57 -08:00 · 7ebac4aef6
commit 7ebac4aef6
parent bf03d71780
20 changed files with 1864 additions and 10 deletions
--- a/docs/tutorials/replication/observability-stack.md
+++ b/docs/tutorials/replication/observability-stack.md
@ -0,0 +1,231 @@
+---
+title: observability-stack
+tags:
+  - tutorials
+  - replication
+  - observability
+---
+
+# Building the Observability Stack
+
+> **Audiences:** Replicator
+
+This tutorial walks through deploying metrics, logs, and dashboards for your homelab - because you can't fix what you can't see.
+
+## The Stack
+
+A complete observability solution has three pillars:
+
+| Component | Purpose | BlumeOps Uses |
+|-----------|---------|---------------|
+| **Metrics** | Numeric measurements over time | [[prometheus]] |
+| **Logs** | Text output from applications | [[loki]] |
+| **Dashboards** | Visualization and alerting | [[grafana]] |
+| **Collection** | Gathering and forwarding data | [[alloy]] |
+
+For BlumeOps specifics, see [[observability|Observability Reference]].
+
+## Step 1: Create Monitoring Namespace
+
+```bash
+kubectl create namespace monitoring
+```
+
+## Step 2: Deploy Prometheus
+
+Prometheus collects and stores metrics.
+
+### Using Helm
+
+```bash
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm install prometheus prometheus-community/prometheus \
+  --namespace monitoring \
+  --set server.persistentVolume.size=10Gi
+```
+
+### Or via ArgoCD
+
+Create an Application pointing to a values file in your repo:
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: prometheus
+  namespace: argocd
+spec:
+  project: default
+  source:
+    repoURL: https://prometheus-community.github.io/helm-charts
+    chart: prometheus
+    targetRevision: 25.0.0
+    helm:
+      values: |
+        server:
+          persistentVolume:
+            size: 10Gi
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: monitoring
+```
+
+### Verify
+
+```bash
+kubectl -n monitoring get pods -l app.kubernetes.io/name=prometheus
+```
+
+## Step 3: Deploy Loki
+
+Loki aggregates logs (like Prometheus but for logs).
+
+```bash
+helm repo add grafana https://grafana.github.io/helm-charts
+helm install loki grafana/loki-stack \
+  --namespace monitoring \
+  --set loki.persistence.enabled=true \
+  --set loki.persistence.size=10Gi
+```
+
+This also installs Promtail for log collection from pods.
+
+## Step 4: Deploy Grafana
+
+Grafana provides dashboards and visualization.
+
+```bash
+helm install grafana grafana/grafana \
+  --namespace monitoring \
+  --set persistence.enabled=true \
+  --set persistence.size=1Gi \
+  --set adminPassword=admin  # Change this!
+```
+
+### Configure Data Sources
+
+After installation, add data sources in Grafana UI or via ConfigMap:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-datasources
+  namespace: monitoring
+  labels:
+    grafana_datasource: "1"
+data:
+  datasources.yaml: |
+    apiVersion: 1
+    datasources:
+    - name: Prometheus
+      type: prometheus
+      url: http://prometheus-server.monitoring.svc:80
+      isDefault: true
+    - name: Loki
+      type: loki
+      url: http://loki.monitoring.svc:3100
+```
+
+## Step 5: Access Grafana
+
+Expose via Tailscale:
+```bash
+kubectl -n monitoring port-forward svc/grafana 3000:80 &
+tailscale serve --bg --https 3000 http://localhost:3000
+```
+
+Or create an Ingress.
+
+Default credentials: `admin` / (password you set or retrieve from secret)
+
+## Step 6: Add Dashboards
+
+Import community dashboards from [grafana.com/grafana/dashboards](https://grafana.com/grafana/dashboards/):
+
+| Dashboard | ID | Shows |
+|-----------|-----|-------|
+| Node Exporter Full | 1860 | Host metrics |
+| Kubernetes Cluster | 7249 | Cluster overview |
+| Loki Logs | 13639 | Log exploration |
+
+In Grafana: Dashboards > Import > Enter ID
+
+## Step 7: Deploy Alloy (Optional)
+
+Grafana Alloy is a unified collector that replaces multiple agents (Promtail, node_exporter, etc.).
+
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: alloy
+  namespace: argocd
+spec:
+  project: default
+  source:
+    repoURL: https://grafana.github.io/helm-charts
+    chart: alloy
+    targetRevision: 0.1.0
+    helm:
+      values: |
+        alloy:
+          configMap:
+            content: |
+              // Alloy configuration here
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: monitoring
+```
+
+BluemeOps uses Alloy on both [[indri]] (for host metrics, via [[reference/ansible/roles | Ansible role]]) and in the [[cluster]] (for pod logs and service probes).
+
+## What You Now Have
+
+- Metrics collection and storage (Prometheus)
+- Log aggregation (Loki)
+- Dashboards and visualization (Grafana)
+- Foundation for alerting
+
+## Adding Alerts
+
+Configure alerting rules in Prometheus:
+
+```yaml
+groups:
+- name: example
+  rules:
+  - alert: HighMemoryUsage
+    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
+    for: 5m
+    labels:
+      severity: warning
+    annotations:
+      summary: "High memory usage detected"
+```
+
+And notification channels in Grafana (email, Slack, PagerDuty, etc.).
+
+## Next Steps
+
+- Create custom dashboards for your services
+- Set up alerting for critical conditions
+- Add service-specific metrics exporters
+
+## BluemeOps Specifics
+
+BlumeOps' observability setup includes:
+- Prometheus scraping all services via annotations
+- Loki collecting logs from all pods and [[indri]] services
+- Custom dashboards for [[jellyfin]], [[teslamate]], and cluster health
+- [[alloy]] running on both host and in-cluster
+
+See [[observability|Observability Reference]] for full details.
+
+## Troubleshooting
+
+| Problem | Solution |
+|---------|----------|
+| No metrics appearing | Check Prometheus targets (`/targets` endpoint) |
+| No logs in Loki | Verify Promtail/Alloy is collecting (`/ready` endpoint) |
+| Dashboard shows no data | Check data source configuration and time range |
+| High storage usage | Adjust retention settings in Prometheus/Loki |