Add docs/ directory with blumeops zk cards

Move 21 blumeops-tagged zettelkasten cards from ~/code/personal/zk/ to docs/ in this repository. These files are symlinked back into the zk at ~/code/personal/zk/blumeops for seamless obsidian.nvim integration. This enables: - Git-managed documentation in the blumeops repo - Preserved wiki links between blumeops docs - obsidian-sync isolation (docs don't sync to other devices) - Direct editing via obsidian.nvim with the blumeops workspace Also updates zk-docs mise task to read from local docs/ directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:09:19 -08:00 · 2026-02-02 19:09:19 -08:00 · a7d771d945
commit a7d771d945
parent 4d97ac4c26
22 changed files with 2416 additions and 3 deletions
--- a/docs/1767747119-YCPO.md
+++ b/docs/1767747119-YCPO.md
@ -0,0 +1,247 @@
+---
+id: 1767747119-YCPO
+aliases:
+  - blumeops
+  - BlumeOps
+tags:
+  - blumeops
+---
+
+BlumeOps, aka Blue Mops, refers to my own personal computing operations stack.
+
+Source code: https://forge.ops.eblu.me/eblume/blumeops (mirrored to https://github.com/eblume/blumeops)
+
+# Infrastructure
+
+| Host                             | Description              | Notes                                              |
+|----------------------------------|--------------------------|----------------------------------------------------|
+| **[[indri|Indri]]**             | Mac Mini M1, 2020        | Primary server, 2TB internal disk                  |
+| **[Sifaka](https://nas.ops.eblu.me)** | Synology NAS           | 10.9TB RAID 5, backup target                       |
+| **Gilbert**                      | 13" MacBook Air M4, 2025 | Primary workstation                                |
+| **Mouse**                        | 13" MacBook Air M2       | Allison's laptop                                   |
+| **[UniFi](https://192.168.1.1)** | UniFi Express 7          | Home WiFi network ([cloud](https://unifi.ui.com)) |
+| **Dwarf**                        | iPad Air                 | Employer-provided, off tailnet                     |
+
+All devices are connected via [Tailscale](https://login.tailscale.com/) tailnet `tail8d86e.ts.net`.
+
+## Tailscale Access Control
+
+ACLs are managed via Pulumi in `pulumi/policy.hujson`. See [[pulumi]] for deployment commands.
+
+**Important lesson learned:**
+- Don't tag user-owned devices (like gilbert) - tagging converts them to "tagged devices" which lose user identity and break user-based SSH rules
+
+### Groups
+
+| Group               | Members                                    | Purpose                          |
+|---------------------|--------------------------------------------|----------------------------------|
+| `group:allisonflix` | blume.erich@gmail.com, acmdavis@gmail.com  | Jellyfin media access            |
+
+### Device Tags
+
+| Tag              | Devices     | Purpose                                    |
+|------------------|-------------|--------------------------------------------|
+| `tag:homelab`    | indri       | Server infrastructure                      |
+| `tag:nas`        | sifaka      | Network-attached storage for backups       |
+| `tag:blumeops`   | indri, sifaka | Resources managed by Pulumi IaC          |
+| `tag:registry`   | indri       | Container registry access                  |
+| `tag:k8s-api`    | indri       | Kubernetes API server access               |
+
+### Access Matrix
+
+| Source                   | Kiwix | Forge | PyPI | Miniflux | PostgreSQL | NAS | Grafana | Loki |
+|--------------------------|-------|-------|------|----------|------------|-----|---------|------|
+| `autogroup:admin`        | Y     | Y     | Y    | Y        | Y          | Y   | Y       | Y    |
+| `autogroup:member`       | Y     | Y     | Y    | Y        | Y          | -   | -       | -    |
+| `tag:homelab`            | -     | -     | -    | -        | -          | Y   | -       | -    |
+
+Notes:
+- **Admins** - full access to all services via `autogroup:admin`
+- **Allison** (`acmdavis@gmail.com`) - member services only, no Grafana/Loki/NAS
+
+### SSH Access
+
+| Source                  | Destinations    | Auth        |
+|-------------------------|-----------------|-------------|
+| `autogroup:member`      | `autogroup:self`| check       |
+| `autogroup:admin`       | `tag:homelab`   | check (12h) |
+| `autogroup:admin`       | `tag:nas`       | check (12h) |
+
+# Services
+
+Services are accessible via two DNS domains:
+- **`*.ops.eblu.me`** - Caddy reverse proxy (reachable from k8s pods, docker containers, and tailnet)
+- **`*.tail8d86e.ts.net`** - Tailscale MagicDNS (tailnet clients only, not from k8s/docker)
+
+## Caddy Services (`*.ops.eblu.me`)
+
+Caddy proxies to k8s services via their Tailscale endpoints (traffic stays local on indri).
+Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs work - use ops.eblu.me for access from pods/containers.
+
+| Service        | URL                               | Description                        | Management Log  |
+|----------------|-----------------------------------|------------------------------------|-----------------|
+| **Homepage**   | https://go.ops.eblu.me            | Service dashboard / start page     | —               |
+| **Forgejo**    | https://forge.ops.eblu.me         | Git hosting (SSH: port 2222)       | [[forgejo]]     |
+| **Registry**   | https://registry.ops.eblu.me      | OCI container registry (Zot)       | [[zot]]         |
+| **Sifaka NAS** | https://nas.ops.eblu.me           | Synology NAS dashboard             | —               |
+| **Grafana**    | https://grafana.ops.eblu.me       | Dashboards & observability (k8s)   | [[grafana]]     |
+| **ArgoCD**     | https://argocd.ops.eblu.me        | GitOps continuous delivery (k8s)   | [[argocd]]      |
+| **Prometheus** | https://prometheus.ops.eblu.me    | Metrics collection (k8s)           | [[prometheus]]  |
+| **Loki**       | https://loki.ops.eblu.me          | Log aggregation (k8s)              | [[loki]]        |
+| **Miniflux**   | https://feed.ops.eblu.me          | RSS/Atom feed reader (k8s)         | [[miniflux]]    |
+| **PyPI**       | https://pypi.ops.eblu.me          | PyPI caching proxy (devpi, k8s)    | [[pypi]]        |
+| **Kiwix**      | https://kiwix.ops.eblu.me         | Offline Wikipedia & ZIM (k8s)      | [[argocd]]      |
+| **Torrent**    | https://torrent.ops.eblu.me       | BitTorrent daemon web UI (k8s)     | [[argocd]]      |
+| **TeslaMate**  | https://tesla.ops.eblu.me         | Tesla data logger (k8s)            | [[teslamate]]   |
+| **Immich**     | https://photos.ops.eblu.me        | Photo management (k8s Helm, CNPG)  | [[argocd]]      |
+| **DJ**         | https://dj.ops.eblu.me            | Music streaming server (Navidrome) | [[navidrome]]   |
+| **PostgreSQL** | pg.ops.eblu.me:5432               | Database server (k8s CloudNativePG)| [[postgresql]]  |
+
+## Tailscale-Only Services (`*.tail8d86e.ts.net`)
+
+These services are only accessible via Tailscale (not from k8s pods/containers):
+
+| Service        | URL                               | Description                        | Management Log  |
+|----------------|-----------------------------------|------------------------------------|-----------------|
+| **Kubernetes** | https://k8s.tail8d86e.ts.net      | Minikube API (TCP passthrough)     | [[minikube]]    |
+| **Jellyfin**   | https://jellyfin.ops.eblu.me      | Media server (VideoToolbox HW)     | [[jellyfin]]    |
+
+Supporting services (not directly user-facing):
+
+| Service             | Description                           | Management Log   |
+|---------------------|---------------------------------------|------------------|
+| **Alloy (indri)**   | Metrics & logs collector (indri host) | [[alloy]]        |
+| **Alloy (k8s)**     | Pod log collection & service probes   | [[alloy]]        |
+| **Kube-state-metrics** | K8s resource metrics (pods, deployments) | [[prometheus]] |
+| **Borgmatic**       | Daily backups to Sifaka NAS (2:00 AM) | [[borgmatic]]    |
+
+## Port Map (Indri)
+
+| Port  | Service       | Protocol | Binding     | Notes                                      |
+|-------|---------------|----------|-------------|--------------------------------------------|
+| 443   | Caddy         | HTTPS    | 0.0.0.0     | Reverse proxy for `*.ops.eblu.me`          |
+| 2222  | Caddy L4      | TCP      | 0.0.0.0     | SSH proxy → Forgejo (localhost:2200)       |
+| 5432  | Caddy L4      | TCP      | 0.0.0.0     | PostgreSQL proxy → k8s pg                  |
+| 2200  | Forgejo SSH   | TCP      | localhost   | Built-in SSH server                        |
+| 3001  | Forgejo       | HTTP     | localhost   | Web UI (proxied by Caddy)                  |
+| 5050  | Zot           | HTTP     | localhost   | Registry API (proxied by Caddy)            |
+| 8096  | Jellyfin      | HTTP     | localhost   | Media server (proxied by Caddy)            |
+| 44491 | K8s API       | HTTPS    | 0.0.0.0     | Minikube API server (via Tailscale k8s.*)  |
+
+# Service Management
+
+## Pulumi (Tailnet IaC)
+
+Tailnet-wide configuration (ACLs, tags, DNS) is managed via Pulumi. See [[pulumi]] for details.
+
+```bash
+mise run tailnet-preview   # preview ACL changes
+mise run tailnet-up        # apply ACL changes
+```
+
+Edit `pulumi/policy.hujson` to modify ACLs or add new tags.
+
+## Ansible
+
+Services on Indri are managed via ansible. Playbooks live in the `ansible/` directory of the blumeops repo:
+
+```bash
+mise run provision-indri        # runs ansible/playbooks/indri.yml
+mise run indri-services-check   # checks health of all services
+```
+
+Run with `--check --diff` first to preview changes, or target specific services:
+
+```bash
+mise run provision-indri -- --check --diff          # dry run
+mise run provision-indri -- --tags alloy            # only alloy
+mise run provision-indri -- --tags zot,borgmatic    # multiple tags
+```
+
+## Adding a New Service
+
+### Indri Services (via Caddy)
+
+For services running directly on indri that need to be accessible from k8s pods:
+
+1. Host service locally on localhost (e.g., localhost:3000)
+2. Add service to `ansible/roles/caddy/defaults/main.yml` under `caddy_services`
+3. Run `mise run provision-indri -- --tags caddy`
+4. Add backup entry in borgmatic role if needed
+
+DNS is handled by a wildcard record (`*.ops.eblu.me` → indri's Tailscale IP) managed via Pulumi in `pulumi/gandi/`.
+
+Access via `https://foo.ops.eblu.me`.
+
+### K8s Services (via Tailscale Ingress)
+
+For services running in minikube:
+
+1. Create Kubernetes manifests in `argocd/manifests/<service>/`
+2. Add ArgoCD Application in `argocd/apps/<service>.yaml`
+3. Add Tailscale Ingress annotation for `*.tail8d86e.ts.net` hostname
+4. Add Homepage annotations to the Ingress for dashboard discovery (see below)
+5. Add Caddy proxy entry in `ansible/roles/caddy/defaults/main.yml`
+6. Sync via ArgoCD: `argocd app sync <service>`
+
+Access via `https://foo.ops.eblu.me` (preferred) or `https://foo.tail8d86e.ts.net`.
+
+**Note:** K8s services using Tailscale Ingress are NOT accessible from other k8s pods or docker containers. Use Caddy (`*.ops.eblu.me`) if pod-to-service communication is needed.
+
+**Homepage annotations** for automatic dashboard discovery:
+```yaml
+annotations:
+  gethomepage.dev/enabled: "true"
+  gethomepage.dev/name: "My Service"
+  gethomepage.dev/group: "Apps"
+  gethomepage.dev/icon: "myservice.png"
+  gethomepage.dev/description: "Short description"
+  gethomepage.dev/href: "https://myservice.ops.eblu.me"
+  gethomepage.dev/pod-selector: "app=myservice"
+```
+
+Icons use [Dashboard Icons](https://github.com/walkxcode/dashboard-icons) format (e.g., `grafana.png`, `prometheus.png`). The `pod-selector` annotation enables pod status badges on the dashboard.
+
+## Secrets Management
+
+Kubernetes secrets are managed via [[external-secrets|External Secrets Operator]], which syncs from 1Password via 1Password Connect.
+
+To add a secret to a k8s service:
+1. Ensure the 1Password item exists in the `blumeops` vault
+2. Create an `ExternalSecret` manifest in the service's directory
+3. Reference the `onepassword-blumeops` ClusterSecretStore
+4. Sync via ArgoCD
+
+See [[external-secrets]] for detailed usage and bootstrap instructions.
+
+# Notes
+
+## Go DNS Resolution on macOS
+
+**Important lesson learned (2026-01-22):**
+Go programs built with `CGO_ENABLED=0` (pure Go) use a DNS resolver that reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files. This breaks Tailscale MagicDNS resolution.
+
+**Solution:** Build Go programs with `CGO_ENABLED=1` to use the macOS native resolver. This is why [[alloy|Grafana Alloy]] is built from source rather than using the Homebrew bottle.
+
+## Remote Kubernetes Access (from Gilbert)
+
+The minikube cluster on indri is accessible from gilbert via Tailscale service.
+Cluster was created with `--apiserver-names=k8s.tail8d86e.ts.net,indri --listen-address=0.0.0.0`.
+API server exposed at `https://k8s.tail8d86e.ts.net` via TCP passthrough (preserves mTLS).
+
+**Fish abbreviations** (in `~/.config/fish/config.fish`):
+- `ki` -> `kubectl --context=minikube-indri`
+- `k9i` -> `k9s --context=minikube-indri`
+- `k9` -> `k9s`
+
+```bash
+# Quick access via abbreviations
+ki get nodes
+k9i
+
+# Or explicitly set context
+kubectl config use-context minikube-indri
+kubectl get nodes
+```
+
+Credentials are stored in 1Password and fetched via exec credential plugin. See [[minikube]] for details.
--- a/docs/1768246525-RVRY.md
+++ b/docs/1768246525-RVRY.md
@ -0,0 +1,136 @@
+---
+id: 1768246525-RVRY
+aliases:
+  - forgejo
+  - forge
+tags:
+  - blumeops
+  - forgejo
+  - git
+  - scm
+  - forge
+---
+
+# Mon Jan 12 11:35
+
+```fish
+❯ brew install forgejo
+❯ brew --prefix forgejo
+/opt/homebrew/opt/forgejo
+❯ brew services start forgejo
+==> Successfully started `forgejo` (label: homebrew.mxcl.forgejo)
+```
+
+From the service definition I can see that this runs as:
+
+```bash
+/opt/homebrew/opt/forgejo/bin/forgejo web --work-path /opt/homebrew/var/forgejo > /opt/homebrew/var/log/forgejo.log 2> /opt/homebrew/var/log/forgejo.log
+```
+It sounds from the docs like this means the config file should live at:
+```
+/opt/homebrew/var/forgejo/custom/conf/app.ini
+```
+Ah, based on the logs, it looks like forgejo has picked port 3000 which is used by grafana:
+```
+❯ lsof -nP -iTCP:3000 -sTCP:LISTEN
+COMMAND  PID       USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
+grafana 1530 erichblume   15u  IPv6 0x4acfad8b21dcb063      0t0  TCP *:3000 (LISTEN)
+```
+Ok I've set a basic config for port 3001, and then gone through the basic app setup. Looks like it's working! Not sure how SSH works yet though. Let's get this service registered.
+
+Ok so the next issue is that I want to use ssh as my primary git interface, and
+I want that to look to users like I'm using port 22 but I want to host it on
+indri which has its own separate ssh setup. Hmm. Let's tell forgejo to use port
+2200. Ah perfect, we can set SSH_PORT to 22 and SSH_LISTEN_PORT to 2200.
+
+Hmm, let's stop running this as me and run as a new user, 'forgejo'.
+```fish
+sudo sysadminctl -addUser forgejo -system -shell /usr/bin/false
+sudo chown -R forgejo:staff /opt/homebrew/var/forgejo
+```
+Ok, I think I need to switch all my services on this host over to a services file.
+
+Wow, missing from the above is like 4 hours of deep diving in to the particulars of tailscale service definition hosting. In the end, I never got a services file to work - and yes, I did remember to advertise! Adding to the complexity is that I didn't discover until the end that you can't do "hairpinning", ie you CANNOT use the tailnet service name from the host doing that hosting. I probably had it fixed at some point hours ago and ruled it out because I didn't know about the hairpinning issue. So anyway... what ended up working was to just use the cli:
+```fish
+tailscale serve --service="svc:forge" --tcp=22 tcp://localhost:2200
+tailscale serve --service="svc:forge" --https=443 http://localhost:3001
+```
+That's it. Nothing else needed, worked right away. Sheesh. (Ok there was also a
+solid hour spent on permission issues... I honestly don't know how it's working
+now, as there is now a `forgejo` user and the config says to use it but the
+files are all owned by `erichblume:staff` but with group permissions set... in
+any case, it friggin' works. So I'm happy.
+
+# Configuration (Ansible-Managed)
+
+As of 2026-01-23, the `app.ini` is managed by ansible:
+- Template: `ansible/roles/forgejo/templates/app.ini.j2`
+- Secrets fetched from 1Password in playbook pre_tasks
+- Secrets item: "Forgejo Secrets" in blumeops vault (fields: `lfs-jwt-secret`, `internal-token`, `oauth2-jwt-secret`, `runner_reg`)
+
+Deploy config changes:
+```bash
+mise run provision-indri -- --tags forgejo
+```
+
+# Forgejo Actions (CI/CD)
+
+## Runner (k8s)
+
+The Forgejo runner runs in Kubernetes with Docker-in-Docker (DinD) for container builds.
+
+**Architecture:**
+- Runner daemon + DinD sidecar in a single pod
+- Jobs execute in containers using the `k8s` label
+- DinD exposes Docker API on `tcp://127.0.0.1:2375`
+- Pods reach `*.ops.eblu.me` services via Caddy reverse proxy
+
+**Components:**
+- ArgoCD app: `argocd/apps/forgejo-runner.yaml`
+- Manifests: `argocd/manifests/forgejo-runner/`
+- Job image: `registry.ops.eblu.me/blumeops/forgejo-runner` (Node.js + Docker CLI)
+- Job image source: `containers/forgejo-runner/`
+
+**Deployment:**
+```bash
+# Apply secret (contains runner token from 1Password)
+op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
+
+# Sync via ArgoCD
+argocd app sync forgejo-runner
+```
+
+**View logs:**
+```bash
+kubectl --context=minikube-indri logs -n forgejo-runner -l app=forgejo-runner -c runner
+```
+
+## Container Build Workflow
+
+Container images are built via `.forgejo/workflows/build-container.yaml`, triggered by tags matching `<container>-v<version>`.
+
+**Release a container:**
+```bash
+mise run container-list                            # See available containers
+mise run container-tag-and-release nettest v1.0.0  # Tag and trigger build
+```
+
+**Test container** (`containers/nettest/`): Network connectivity test for debugging CI/CD.
+
+## Workflows
+
+Workflows live in `.forgejo/workflows/` (not `.github/workflows/`).
+
+**Important**: Use `github.*` context variables, not `gitea.*`. Forgejo supports both at runtime, but:
+1. The Forgejo web UI schema validator only recognizes `github.*`
+2. `actionlint` pre-commit hook validates workflows locally (catches errors before push)
+3. Pass untrusted inputs (like `github.head_ref`) through env vars for security
+
+## Runner Token
+
+Stored in 1Password "Forgejo Secrets" item, field `runner_reg`.
+
+To create a new token:
+1. Go to https://forge.ops.eblu.me/admin/actions/runners
+2. Click "Create new Runner"
+3. Copy the token and update 1Password
--- a/docs/1768283761-TRXN.md
+++ b/docs/1768283761-TRXN.md
@ -0,0 +1,95 @@
+---
+id: 1768283761-TRXN
+aliases:
+  - prometheus
+tags:
+  - blumeops
+---
+
+# Prometheus Management Log
+
+Prometheus provides metrics storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure, running in Kubernetes (minikube on indri).
+
+## Service Details
+
+- URL: https://prometheus.tail8d86e.ts.net
+- Namespace: `monitoring`
+- Image: `prom/prometheus:v3.2.1`
+- ArgoCD app: `prometheus`
+- Storage: 50Gi PVC
+
+## Data Sources
+
+### Remote Write (from Alloy)
+- Indri system metrics via [[alloy|Grafana Alloy]] remote_write
+- Textfile metrics: minikube, borgmatic, zot, jellyfin
+
+### Scrape Targets
+- `sifaka:9100` - Synology NAS (node_exporter in Docker)
+- `cnpg-metrics.tail8d86e.ts.net:9187` - CloudNativePG PostgreSQL metrics
+- `kube-state-metrics.monitoring.svc:8080` - Kubernetes resource metrics (pods, deployments, etc.)
+
+## Useful Commands
+
+```bash
+# View logs
+kubectl --context=minikube-indri -n monitoring logs -f prometheus-0
+
+# Check targets
+curl -s https://prometheus.tail8d86e.ts.net/api/v1/targets | jq '.data.activeTargets[].scrapeUrl'
+
+# Sync from ArgoCD
+argocd app sync prometheus
+```
+
+## ArgoCD Management
+
+Prometheus is deployed via ArgoCD from `argocd/manifests/prometheus/`:
+- `statefulset.yaml` - StatefulSet with 50Gi PVC
+- `configmap.yaml` - Prometheus configuration
+- `service.yaml` - ClusterIP service
+- `ingress-tailscale.yaml` - Tailscale Ingress
+
+## Log
+
+### Wed Jan 22 2026 (observability cleanup)
+
+- Added kube-state-metrics scrape target for k8s resource metrics
+- Enhanced Minikube dashboard with namespace filtering and resource usage panels
+- Uses `kube_pod_info`, `kube_pod_container_resource_requests`, etc.
+
+### Wed Jan 22 2026 (later)
+
+- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet
+- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net`
+- Remote write endpoint now at k8s service, Alloy updated to push there
+- Retired ansible prometheus role from indri
+- Added ACL grant for `tag:homelab` → `tag:k8s` on port 443 for Alloy access
+
+### Wed Jan 22 2026
+
+Added CNPG PostgreSQL metrics scraping. The CloudNativePG operator exposes Prometheus metrics on port 9187. Exposed via Tailscale at `cnpg-metrics.tail8d86e.ts.net:9187` and added to scrape_configs as job `cnpg-postgres`.
+
+### Wed Jan 15 2026
+
+Prometheus now accepts metrics via remote_write from [[alloy|Grafana Alloy]]. The `--web.enable-remote-write-receiver` flag was added to enable this.
+
+Indri metrics are no longer scraped - they're pushed by Alloy. Sifaka still uses traditional scraping via node_exporter running in Docker on the Synology.
+
+### Mon Jan 13 2026
+
+Prometheus is now managed via ansible in [[1767747119-YCPO|blumeops]]. Configuration files are templated from the ansible role.
+
+### Mon Jan 12 2026 21:56
+
+Prometheus was stood up about a week ago at this point. I am currently renaming
+`localhost` to `indri` in the scrape_configs. While I'm here I'm going to see
+if I can add Synology stats.
+
+I'm adding Container Manager to Sifaka now. I should probably have a Sifaka
+management log, but not yet. Downloaded prom/node-exporter and made a container
+for it. Using the latest tag because I'm nasty.
+
+Done. Adding to scrape configs.
+
+Ok, it didn't like the indri hostname. Could probably fix somehow with either magicdns or /etc/hosts but for now, I'm using `relabel_configs`. This is working. Gotta go to bed.
--- a/docs/1768457769-LOCK.md
+++ b/docs/1768457769-LOCK.md
@ -0,0 +1,149 @@
+---
+id: 1768457769-LOCK
+aliases:
+  - pypi
+  - devpi
+tags:
+  - blumeops
+---
+
+# PyPI / devpi Management Log
+
+PyPI caching proxy running in Kubernetes (minikube on indri) via devpi-server.
+
+## Service Details
+
+- URL: https://pypi.tail8d86e.ts.net
+- Namespace: devpi
+- Image: registry.tail8d86e.ts.net/blumeops/devpi:latest (custom image with devpi-server + devpi-web)
+- ArgoCD app: devpi
+- Storage: 50Gi PVC
+
+## Useful Commands
+
+```bash
+# View logs
+kubectl --context=minikube-indri -n devpi logs -f statefulset/devpi
+
+# Restart pod
+kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi
+
+# Check health
+curl https://pypi.tail8d86e.ts.net/+api
+
+# Sync from ArgoCD
+argocd app sync devpi
+```
+
+## ArgoCD Management
+
+Devpi is deployed via ArgoCD from `argocd/manifests/devpi/`:
+- `statefulset.yaml` - StatefulSet with 50Gi PVC
+- `service.yaml` - ClusterIP service
+- `ingress-tailscale.yaml` - Tailscale Ingress for external access
+- `Dockerfile` - Custom image with startup script
+- `start.sh` - Auto-initialization script
+
+## Users and Indices
+
+### Structure
+
+- `root/pypi` - PyPI mirror/cache (auto-created)
+- `eblume/dev` - Private packages index (inherits from root/pypi)
+
+### Creating a User and Index
+
+```bash
+# Login as root
+uvx devpi use https://pypi.tail8d86e.ts.net
+uvx devpi login root
+
+# Create user (prompts for password - store in 1Password)
+uvx devpi user -c USERNAME email=EMAIL
+
+# Create index inheriting from PyPI mirror
+uvx devpi index -c USERNAME/dev bases=root/pypi
+```
+
+### Uploading Packages (with uv)
+
+```bash
+# Store credentials (one-time, prompts for username/password)
+uv auth login https://pypi.tail8d86e.ts.net
+
+# Build and publish
+cd ~/code/personal/your-package
+uv build
+uv publish --publish-url https://pypi.tail8d86e.ts.net/eblume/dev/
+```
+
+Note: The "trusted publishing failed" warning is expected (devpi doesn't support OIDC).
+
+### Uploading Packages (with devpi-client)
+
+```bash
+# Login as the user
+uvx devpi login USERNAME
+
+# Use the index
+uvx devpi use eblume/dev
+
+# Upload from project directory
+uvx devpi upload
+```
+
+## Client Configuration
+
+On workstations, configure pip to use the proxy.
+
+**pip.conf** (`~/.config/pip/pip.conf`):
+```ini
+[global]
+index-url = https://pypi.tail8d86e.ts.net/root/pypi/+simple/
+trusted-host = pypi.tail8d86e.ts.net
+```
+
+After creating/editing, track with chezmoi:
+```bash
+chezmoi add ~/.config/pip/pip.conf
+```
+
+## Credentials
+
+- Root password stored in 1Password (blumeops vault)
+- Injected into k8s via `devpi-root` secret from `secret-root.yaml.tpl`
+
+## Backup
+
+Private packages (`eblume/dev` index) are stored in the devpi PVC. The PyPI mirror cache (`root/pypi`) is not backed up as it can be re-fetched.
+
+**TODO**: Add devpi PVC backup to borgmatic once k8s volume backup strategy is established.
+
+## Related
+
+- [[1767747119-YCPO|BlumeOps project card]]
+- [[argocd|ArgoCD]] for deployment
+- [[minikube|Kubernetes cluster]]
+
+## Log
+
+### Mon Jan 20 2026
+
+- **Migrated to Kubernetes** (Phase 5 of k8s migration)
+- Custom container image with devpi-server + devpi-web + auto-init startup script
+- StatefulSet with 50Gi PVC for data persistence
+- Tailscale Ingress at `pypi.tail8d86e.ts.net`
+- Root password from 1Password secret, auto-initialized on first run
+- Verified pip caching proxy and mcquack package upload
+- **Key learnings:**
+  - Minikube CRI-O can't resolve Tailscale hostnames - added registry mirror config
+  - devpi-web Whoosh indexer needs ~2Gi during initial PyPI index build
+  - Kubernetes auto-sets `DEVPI_PORT` for service discovery - renamed to `DEVPI_LISTEN_PORT`
+- Removed LaunchAgent from indri, cleared Tailscale serve entry
+
+### Previous (indri era)
+
+- Initial setup with devpi on indri via mcquack LaunchAgent
+- Connected via Tailscale at pypi.tail8d86e.ts.net
+- Created eblume/dev index for private packages
+- Metrics collection via textfile exporter
--- a/docs/1768506761-GHUW.md
+++ b/docs/1768506761-GHUW.md
@ -0,0 +1,167 @@
+---
+id: 1768506761-GHUW
+aliases:
+  - alloy
+  - grafana-alloy
+tags:
+  - blumeops
+---
+
+# Grafana Alloy Management Log
+
+Grafana Alloy is a unified observability collector with two deployments:
+1. **Indri (host)** - System metrics and service logs from macOS host
+2. **Kubernetes (DaemonSet)** - Automatic pod log collection and service health probes
+
+## Service Details
+
+- Binary: `~/.local/bin/alloy` (built from source with CGO_ENABLED=1)
+- Config: `~/.config/grafana-alloy/config.alloy`
+- Data: `~/.local/share/grafana-alloy/`
+- Logs: `~/Library/Logs/mcquack.alloy.{out,err}.log`
+- Managed via: mcquack LaunchAgent (`mcquack.eblume.alloy`)
+
+**Why built from source?** The Homebrew bottle is built with `CGO_ENABLED=0`, which uses Go's pure DNS resolver. This resolver reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files, breaking Tailscale MagicDNS hostname resolution. Building with `CGO_ENABLED=1` uses the macOS native resolver.
+
+## What Alloy Collects
+
+### Metrics
+- System metrics via `prometheus.exporter.unix` (same metrics as node_exporter)
+- Textfile collector reads from `/opt/homebrew/var/node_exporter/textfile/`
+  - `minikube.prom` - Minikube cluster status
+  - `borgmatic.prom` - Backup status metrics
+  - `zot.prom` - Container registry metrics
+  - `jellyfin.prom` - Jellyfin media server metrics
+- Zot registry metrics scraped from `http://localhost:5050/metrics`
+- Metrics pushed to Prometheus (k8s) via remote_write at `https://prometheus.tail8d86e.ts.net/api/v1/write`
+
+### Logs
+Collects logs from all services on Indri:
+
+**Brew services:**
+- forgejo
+- tailscale
+
+**mcquack LaunchAgents:**
+- alloy (stdout/stderr)
+- borgmatic (stdout/stderr)
+- zot (stdout/stderr)
+- jellyfin (stdout/stderr)
+
+Logs pushed to Loki (k8s) at `https://loki.tail8d86e.ts.net/loki/api/v1/push`.
+
+## Useful Commands
+
+```bash
+# Check service status
+ssh indri 'launchctl list | grep alloy'
+
+# View alloy logs
+ssh indri 'tail -f ~/Library/Logs/mcquack.alloy.err.log'
+
+# Restart service
+ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist && launchctl load ~/Library/LaunchAgents/mcquack.eblume.alloy.plist'
+```
+
+## Building from Source
+
+Alloy must be built with CGO to use macOS native DNS resolver (required for Tailscale MagicDNS):
+
+```bash
+# On gilbert (dev workstation):
+git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy
+cd ~/code/3rd/alloy && mise use go@1.25 node yarn
+mise x -- make alloy
+scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy
+```
+
+Then run ansible to deploy the config and LaunchAgent.
+
+## Ansible Management (Indri)
+
+Alloy on Indri is managed via ansible in [[1767747119-YCPO|blumeops]].
+
+```bash
+mise run provision-indri -- --tags alloy
+```
+
+## Kubernetes Alloy (alloy-k8s)
+
+A separate Alloy DaemonSet runs in k8s for:
+- **Automatic pod log collection** - discovers and collects logs from all pods
+- **Service health probes** - HTTP blackbox probes for k8s services
+
+### Service Details (k8s)
+
+- Namespace: `alloy`
+- Image: `grafana/alloy:v1.8.2`
+- ArgoCD app: `alloy-k8s`
+- Manifests: `argocd/manifests/alloy-k8s/`
+
+### What k8s Alloy Collects
+
+**Pod logs (automatic discovery):**
+- All pods in all namespaces via `loki.source.kubernetes`
+- Labels: namespace, pod, container, node
+
+**Service health probes:**
+- miniflux, kiwix, transmission, devpi, argocd
+- Metrics: `probe_success`, `probe_duration_seconds`
+- Labels: `job="integrations/blackbox/<service>"`
+
+### Useful Commands (k8s Alloy)
+
+```bash
+# View alloy-k8s logs
+kubectl --context=minikube-indri -n alloy logs -f daemonset/alloy
+
+# Check running config
+kubectl --context=minikube-indri -n alloy get configmap alloy-config -o yaml
+
+# Sync from ArgoCD
+argocd app sync alloy-k8s
+```
+
+## Log
+
+### Wed Jan 22 2026 (later)
+
+- **Added Alloy k8s DaemonSet** for automatic pod log collection
+- Logs from all k8s pods now forwarded to Loki with automatic discovery
+- Added service health probes for miniflux, kiwix, transmission, devpi, argocd
+- New "Services Health" Grafana dashboard shows probe metrics
+- Deleted stale textfile metrics (`devpi.prom`, `transmission.prom`) from indri
+- Deleted stale data directories (`/opt/homebrew/var/loki`, `/opt/homebrew/var/prometheus`)
+
+### Wed Jan 22 2026
+
+- **Rebuilt from source with CGO_ENABLED=1** - required for Tailscale MagicDNS resolution
+- Migrated from Homebrew to mcquack LaunchAgent management
+- Updated remote_write to push to k8s Prometheus at `prometheus.tail8d86e.ts.net`
+- Updated log push to k8s Loki at `loki.tail8d86e.ts.net`
+- Removed prometheus/loki log collection (now running in k8s)
+- Binary now at `~/.local/bin/alloy`, config at `~/.config/grafana-alloy/`
+- Added build instructions to ansible role defaults
+
+### Mon Jan 20 2026
+
+- Removed devpi log collection (devpi migrated to k8s)
+- Removed devpi.prom textfile collection (metrics role retired)
+- Removed grafana log collection (grafana migrated to k8s in P2)
+
+### Wed Jan 15 2026
+
+- Initial setup replacing node_exporter
+- Configured metrics push via remote_write to Prometheus
+- Configured log collection for all services, forwarding to Loki
+
+### Thu Jan 30 2026
+
+- Removed Plex log and metrics collection (replaced by Jellyfin)
+- Added Jellyfin log collection via mcquack LaunchAgent logs
+- Added jellyfin.prom textfile metrics
+
+### Wed Jan 15 2026 (later)
+
+- Added Plex Media Server log collection (removed 2026-01-30)
+- Added plex.prom metrics from plex_metrics role (removed 2026-01-30)
--- a/docs/1768506761-XGYX.md
+++ b/docs/1768506761-XGYX.md
@ -0,0 +1,82 @@
+---
+id: 1768506761-XGYX
+aliases:
+  - loki
+tags:
+  - blumeops
+---
+
+# Loki Management Log
+
+Loki is a log aggregation system running in Kubernetes (minikube on indri), providing log storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure.
+
+## Service Details
+
+- URL: https://loki.tail8d86e.ts.net
+- Namespace: `monitoring`
+- Image: `grafana/loki:3.4.2`
+- ArgoCD app: `loki`
+- Storage: 50Gi PVC
+- Retention: 31 days
+
+## Architecture
+
+- Single-node deployment with filesystem storage
+- TSDB index with 24h period
+- Logs collected by [[alloy|Grafana Alloy]] and pushed via Loki API
+- Queried via Grafana using the Loki datasource
+
+## Useful Commands
+
+```bash
+# View logs
+kubectl --context=minikube-indri -n monitoring logs -f loki-0
+
+# Check if Loki is ready
+curl -s https://loki.tail8d86e.ts.net/ready
+
+# Sync from ArgoCD
+argocd app sync loki
+```
+
+## Grafana Integration
+
+Loki is configured as a datasource in Grafana. To explore logs:
+
+1. Go to https://grafana.tail8d86e.ts.net/explore
+2. Select "Loki" datasource
+3. Use LogQL queries:
+   - `{service="forgejo"}` - all forgejo logs
+   - `{service="borgmatic", stream="stderr"}` - borgmatic errors
+   - `{host="indri"} |= "error"` - all logs containing "error"
+
+## ArgoCD Management
+
+Loki is deployed via ArgoCD from `argocd/manifests/loki/`:
+- `statefulset.yaml` - StatefulSet with 50Gi PVC
+- `configmap.yaml` - Loki configuration
+- `service.yaml` - ClusterIP service
+- `ingress-tailscale.yaml` - Tailscale Ingress
+
+## Log
+
+### Thu Jan 23 2026
+
+- Suppressed noisy `v1 Endpoints is deprecated` warning from minikube storage-provisioner ([upstream issue](https://github.com/kubernetes/minikube/issues/21009))
+- Added JSON field extraction for zot compatibility (`message` vs `msg`)
+- Removed logfmt parsing stage - `stage.match` selectors don't prevent Alloy from logging internal decode errors, and most structured logs use JSON anyway
+- Fixed devpi dashboard JSON escaping
+
+### Wed Jan 22 2026
+
+- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet
+- Exposed via Tailscale Ingress at `loki.tail8d86e.ts.net`
+- Alloy updated to push logs to k8s endpoint
+- Retired ansible loki role from indri
+
+### Wed Jan 15 2026
+
+- Initial setup with single-node filesystem storage
+- Configured 31-day retention with compactor
+- Integrated with Grafana as datasource
+- Logs collected via Alloy from all services
--- a/docs/argocd.md
+++ b/docs/argocd.md
@ -0,0 +1,140 @@
+---
+id: argocd
+aliases:
+  - argocd
+  - argo-cd
+tags:
+  - blumeops
+---
+
+# ArgoCD Management Log
+
+ArgoCD provides GitOps continuous delivery for the [[minikube]] cluster on Indri.
+
+## Service Details
+
+- URL: https://argocd.tail8d86e.ts.net
+- Namespace: `argocd`
+- Git source: `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git`
+- Manifests path: `argocd/`
+
+## Sync Policy Decision
+
+**Choice**: Manual sync for workload apps, auto-sync only for app-of-apps.
+
+**Rationale** (decided 2026-01-19 during Phase 1 migration):
+- During migration, we want explicit control over what gets deployed
+- Auto-sync could deploy broken changes while we're still learning the stack
+- The app-of-apps (`apps`) auto-syncs so new Application manifests appear automatically
+- But those Applications have manual sync, so actual workload changes require `argocd app sync <name>`
+
+**Pattern**:
+| Application | Sync Policy | Why |
+|-------------|-------------|-----|
+| `apps` | Automated | Picks up new Application manifests from git |
+| `argocd` | Manual | Self-management changes should be deliberate |
+| `tailscale-operator` | Manual | Infrastructure changes need review |
+| `cloudnative-pg` | Manual | Operator upgrades need care |
+| `blumeops-pg` | Manual | Database changes are sensitive |
+| `grafana` | Manual | Observability stack changes need review |
+| `grafana-config` | Manual | Dashboard changes should be deliberate |
+| `miniflux` | Manual | Application changes need review |
+| `devpi` | Manual | PyPI proxy changes need review |
+
+**Future consideration**: After migration stabilizes, consider enabling auto-sync for stable workloads. Keep manual sync for infrastructure (operators, databases).
+
+## CLI Access
+
+```bash
+# Login (uses Tailscale for network, prompts for password)
+argocd login argocd.tail8d86e.ts.net --grpc-web
+
+# List apps
+argocd app list
+
+# Sync an app
+argocd app sync <app-name>
+
+# Check diff before sync
+argocd app diff <app-name>
+
+# Get app details
+argocd app get <app-name>
+```
+
+## Applications
+
+| App | Path | Description |
+|-----|------|-------------|
+| `apps` | `argocd/apps/` | App-of-apps root |
+| `argocd` | `argocd/manifests/argocd/` | ArgoCD self-management |
+| `tailscale-operator` | `argocd/manifests/tailscale-operator/` | Tailscale k8s operator |
+| `cloudnative-pg` | Helm chart (forge mirror) | PostgreSQL operator |
+| `blumeops-pg` | `argocd/manifests/databases/` | PostgreSQL cluster |
+| `prometheus` | `argocd/manifests/prometheus/` | Metrics storage |
+| `loki` | `argocd/manifests/loki/` | Log aggregation |
+| `grafana` | Helm chart (forge mirror) | Grafana dashboards |
+| `grafana-config` | `argocd/manifests/grafana-config/` | Grafana ingress & dashboards |
+| `alloy-k8s` | `argocd/manifests/alloy-k8s/` | Pod log collection & service probes |
+| `kube-state-metrics` | `argocd/manifests/kube-state-metrics/` | K8s resource metrics |
+| `miniflux` | `argocd/manifests/miniflux/` | RSS feed reader |
+| `devpi` | `argocd/manifests/devpi/` | PyPI caching proxy |
+| `torrent` | `argocd/manifests/torrent/` | BitTorrent daemon |
+| `kiwix` | `argocd/manifests/kiwix/` | Offline Wikipedia & ZIM archives |
+| `forgejo-runner` | `argocd/manifests/forgejo-runner/` | Forgejo Actions CI runner (host mode) |
+
+## Credentials
+
+- Admin password stored in 1Password (updated from initial auto-generated password)
+- Git access via deploy key (SSH) stored in 1Password
+
+## Log
+
+### 2026-01-23 (CI/CD Bootstrap Phase 1)
+- Added `forgejo-runner` - Forgejo Actions CI runner
+- Runner uses host mode (jobs run directly in runner container, no Docker needed)
+- Labels: `ubuntu-latest`, `ubuntu-22.04`
+- Note: Stock runner lacks Node.js, so `actions/checkout@v4` doesn't work - use git clone instead
+- See [[forgejo]] for runner token management and workflow examples
+
+### 2026-01-22 (Observability Cleanup)
+- Added `alloy-k8s` - DaemonSet for automatic pod log collection and service health probes
+- Added `kube-state-metrics` - provides k8s resource metrics (pod counts, resource requests, etc.)
+- Enhanced Minikube dashboard with namespace filtering and resource usage panels
+- Added "Services Health" dashboard with probe metrics for all k8s services
+- Fixed macOS dashboard instance variable to only show macOS hosts
+- Cleaned up stale data: removed old textfile metrics and directories from indri
+- Removed stale `/opt/homebrew/var/loki` from borgmatic backup sources
+
+### 2026-01-22 (Phase 7)
+- **Migrated Prometheus and Loki to k8s** - completed observability stack migration
+- Both now running as StatefulSets with 50Gi PVCs
+- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` and `loki.tail8d86e.ts.net`
+- Grafana datasources updated to use k8s-internal service URLs
+- Alloy rebuilt with CGO for Tailscale DNS resolution, pushes to k8s endpoints
+- Retired ansible prometheus and loki roles from indri
+
+### 2026-01-21 (Phase 6)
+- Added torrent (Transmission BitTorrent) to k8s
+- Added kiwix (offline Wikipedia & ZIM archives) to k8s
+- NFS storage from sifaka for shared torrent/ZIM data
+
+### 2026-01-20 (Phase 5)
+- Added devpi (PyPI caching proxy) to k8s
+- Custom container image in zot registry with devpi-server + devpi-web
+- StatefulSet with 50Gi PVC for data persistence
+- Changed `apps` Application to manual sync (was auto-sync with prune)
+
+### 2026-01-19 (Phase 2)
+- Migrated Grafana from Homebrew/Ansible to Kubernetes
+- Helm chart repos now mirrored to forge (cloudnative-pg-charts, grafana-helm-charts)
+- SSH credential template (`repo-creds-forge`) for all forge repos
+- Added indri SSH host key to ArgoCD known_hosts
+- Tailscale service cutover: deleted old svc:grafana from Tailscale admin to free hostname
+- Retired ansible grafana role
+
+### 2026-01-19 (Phase 1)
+- Completed Phase 1 deployment
+- Decided on manual sync policy for workloads
+- Using internal [[forgejo]] as git source (not GitHub mirror)
+- Exposed via Tailscale Ingress with Let's Encrypt TLS
--- a/docs/borgmatic.md
+++ b/docs/borgmatic.md
@ -0,0 +1,176 @@
+---
+id: borgmatic
+aliases:
+  - borgmatic
+  - borg-backup
+tags:
+  - blumeops
+---
+
+# Borgmatic Management Log
+
+Borgmatic runs daily backups from Indri to Sifaka NAS using Borg backup.
+
+## Service Details
+
+- Installed via: mise (pipx)
+- Config: `~/.config/borgmatic/config.yaml` (ansible-managed)
+- Schedule: Daily at 2:00 AM via LaunchAgent
+- Repository: `/Volumes/backups/borg/` on Sifaka
+
+## What Gets Backed Up
+
+**Directories:**
+- `~/code/personal/zk` - Zettelkasten (primary)
+- `/opt/homebrew/var/forgejo` - Git forge data
+- `~/.config/borgmatic` - Borgmatic config itself
+- `~/Documents` - Personal documents
+- `~/Pictures` - Photos (see note below)
+
+**Note on iCloud Photos:** macOS Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally. Borgmatic only backs up what's on disk, so iCloud-only photos are NOT backed up. If you need full photo backups via borgmatic, either disable "Optimize Mac Storage" in Photos preferences, or use a tool like osxphotos which forces downloads. See log entry 2026-01-28.
+
+**Databases:**
+- `miniflux` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me)
+- `teslamate` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me)
+
+**Not backed up (by design):**
+- ZIM archives in `~/transmission/` - re-downloadable via torrent
+- Prometheus metrics - ephemeral data
+- Loki logs - ephemeral (now in k8s PVC)
+- devpi data - in k8s PVC, backup strategy TBD
+
+## PostgreSQL Backup
+
+Borgmatic uses native `postgresql_databases` support to stream `pg_dump` directly to Borg:
+- No intermediate files needed
+- Database keeps running (no downtime)
+- Consistent transactional snapshots
+- Uses `borgmatic` user with `pg_read_all_data` role
+- Password read from `~/.pgpass` (managed by borgmatic ansible role)
+- Uses explicit `pg_dump_command` path (`/opt/homebrew/opt/postgresql@18/bin/pg_dump`) since LaunchAgent doesn't have homebrew in PATH
+- Uses explicit `local_path` (`/opt/homebrew/bin/borg`) for same reason
+
+**Databases backed up:**
+- `pg.ops.eblu.me:5432/miniflux` - CloudNativePG cluster in k8s
+- `pg.ops.eblu.me:5432/teslamate` - CloudNativePG cluster in k8s
+
+## Ansible Management
+
+Borgmatic is fully managed via ansible in [[1767747119-YCPO|blumeops]]:
+
+```bash
+mise run provision-indri -- --tags borgmatic
+```
+
+The role deploys:
+- `~/.config/borgmatic/config.yaml` - Main configuration
+- LaunchAgent plist for scheduled runs
+
+## Useful Commands
+
+```bash
+# List archives
+ssh indri 'mise x -- borgmatic list'
+
+# Extract from latest archive
+ssh indri 'mise x -- borgmatic extract --archive latest --path /some/path'
+
+# Run backup manually
+ssh indri 'mise x -- borgmatic create --verbosity 1'
+
+# Check repository health
+ssh indri 'mise x -- borgmatic check'
+```
+
+## Retention Policy
+
+- 7 daily backups
+- 12 monthly backups
+- 1000 yearly backups (effectively forever)
+
+## Monitoring
+
+Borgmatic metrics are collected hourly via a script at `~/bin/borgmatic-metrics` and exposed to Prometheus via the node_exporter textfile collector.
+
+View the Grafana dashboard at: https://grafana.tail8d86e.ts.net (select "Borgmatic Backups" dashboard)
+
+Metrics include:
+- `borgmatic_up` - repository accessibility
+- `borgmatic_repo_deduplicated_size_bytes` - actual disk usage
+- `borgmatic_last_archive_original_size_bytes` - size of data being backed up
+- `borgmatic_last_archive_deduplicated_size_bytes` - new data added per backup
+- `borgmatic_archive_count` - number of archives
+- `borgmatic_last_archive_timestamp` - when last backup ran
+
+```bash
+# Check metrics file
+ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom'
+
+# Check metrics LaunchAgent status
+ssh indri 'launchctl list | grep borgmatic-metrics'
+```
+
+## Log
+
+### Tue Jan 28 2026
+
+- Investigated massive backup size increase (~69GB deduplicated, ~94GB per archive)
+- Root cause: immich-sync role (added Jan 26, removed Jan 28) used osxphotos to export photos
+- **Lesson learned:** osxphotos forces Photos.app to download all iCloud originals locally
+- Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally
+- Before immich-sync: borgmatic was backing up thumbnails (~few GB)
+- After immich-sync: borgmatic now has full 42GB of photo originals
+- This is actually a bonus - provides redundant photo backup alongside iCloud and Immich
+- Retention policy means these photos will be kept in yearly archives essentially forever
+- **Future plan:** Once Immich (on sifaka "photos" volume with Synology offsite backup) is fully set up, Pictures may be removed from borgmatic as redundant
+
+### Thu Jan 23 2026
+
+- Note: Forgejo `app.ini` is now managed by ansible (secrets in 1Password)
+- `/opt/homebrew/var/forgejo` still backed up for git repositories and data
+- But `app.ini` recovery no longer depends on borgmatic (can be regenerated via ansible)
+
+### Wed Jan 22 2026
+
+- Removed `/opt/homebrew/var/loki` from backup sources (stale data from pre-k8s migration)
+- Loki now runs in k8s with ephemeral storage - logs are not backed up by design
+- Verified backup integrity after cleanup
+
+### Mon Jan 20 2026 (P5)
+
+- Removed `~/devpi` from backup sources (devpi migrated to k8s)
+- devpi data now in k8s PVC - backup strategy TBD
+
+### Sun Jan 19 2026 (P4)
+
+- Removed localhost PostgreSQL backup (brew pg retired)
+- Updated to backup only `pg.tail8d86e.ts.net` (k8s CloudNativePG)
+- Moved .pgpass management from postgresql role to borgmatic role
+
+### Sun Jan 19 2026 (P3)
+
+- Fixed borgmatic failing to find `borg` binary by adding `local_path` option to config
+- Added k8s-pg (CloudNativePG cluster) backup alongside brew PostgreSQL
+- Added ACL grant for `tag:homelab` → `tag:k8s` on port 5432 for backup access
+- Successfully tested disaster recovery: restored miniflux data from borgmatic dump to k8s-pg
+- Created `borgmatic` user in k8s-pg via CloudNativePG managed roles
+- Both localhost and k8s-pg databases backed up during migration period
+
+### Sat Jan 18 2026
+
+- Fixed borgmatic-metrics script failing in LaunchAgent context by using absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`) instead of `mise x -- borg`
+- This was causing the Grafana dashboard to show "Repository Status: DOWN" and missing time series data
+
+### Fri Jan 17 2026
+
+- Fixed PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found")
+- Removed `~/code/3rd/kiwix-tools` from backups (was just symlinks, ZIM archives are re-downloadable)
+- Enabled Loki log backup (removed from exclude_patterns)
+- Added borgmatic_metrics role for Prometheus metrics collection
+- Added Grafana dashboard for backup monitoring (size trends, dedup ratio, time since last backup)
+
+### Thu Jan 16 2026
+
+- Moved config from manual management to ansible-managed template
+- Added `postgresql_databases` backup for miniflux database
+- Config now deployed via `ansible/roles/borgmatic/templates/config.yaml.j2`
--- a/docs/external-secrets.md
+++ b/docs/external-secrets.md
@ -0,0 +1,75 @@
+---
+id: external-secrets
+aliases:
+  - external-secrets
+  - eso
+  - external-secrets-operator
+tags:
+  - blumeops
+---
+
+# External Secrets Operator
+
+External Secrets Operator (ESO) syncs secrets from 1Password to Kubernetes Secrets via 1Password Connect.
+
+## Architecture
+
+```
+1Password Cloud
+      |
+      v
+1Password Connect (namespace: 1password)
+      |
+      v
+External Secrets Operator (namespace: external-secrets)
+      |
+      v
+Native Kubernetes Secrets
+```
+
+## Usage
+
+ClusterSecretStore `onepassword-blumeops` provides access to the blumeops vault. See `argocd/manifests/devpi/external-secret.yaml` for a simple example.
+
+**Important:** 1Password Connect doesn't support the `?ssh-format=openssh` query parameter. SSH keys must be stored as Secure Notes with the OpenSSH-formatted key (see `argocd-forge-ssh-key` item).
+
+```bash
+# Check all ExternalSecrets
+kubectl --context=minikube-indri get externalsecret -A
+
+# Find 1Password field names
+op item get <item> --vault blumeops --format json | jq '.fields[] | .label'
+```
+
+## Bootstrap (One-Time Setup)
+
+If reinstalling from scratch:
+
+1. Create Connect server credentials:
+   ```bash
+   op connect server create blumeops --vaults blumeops
+   op connect token create blumeops --server <server-id> --vault blumeops
+   ```
+
+2. Store in 1Password item "1Password Connect":
+   - `credentials-file`: raw JSON
+   - `credentials-base64`: base64-encoded JSON
+   - `token`: access token
+
+3. Apply bootstrap secret:
+   ```bash
+   kubectl --context=minikube-indri create namespace 1password
+   op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | \
+     kubectl --context=minikube-indri apply -f -
+   ```
+
+4. Sync apps in order:
+   - `argocd app sync 1password-connect`
+   - `argocd app sync external-secrets-crds`
+   - `argocd app sync external-secrets`
+   - `argocd app sync external-secrets-config`
+
+## Related
+
+- [[1767747119-YCPO|BlumeOps]]
+- [[argocd|ArgoCD]]
--- a/docs/grafana.md
+++ b/docs/grafana.md
@ -0,0 +1,58 @@
+---
+id: grafana
+aliases:
+  - grafana
+tags:
+  - blumeops
+---
+
+# Grafana Management Log
+
+Grafana provides dashboards and observability for [[blumeops]].
+
+## Service Details
+
+- URL: https://grafana.ops.eblu.me (also https://grafana.tail8d86e.ts.net)
+- Namespace: `monitoring`
+- Helm chart: grafana (mirrored to forge)
+- Values: `argocd/manifests/grafana/values.yaml`
+- Dashboards: `argocd/manifests/grafana-config/dashboards/`
+
+## Embedding Note
+
+Grafana panel embedding via iframes was attempted for Homepage but didn't work well:
+- Homepage's iframe widget doesn't support width constraints (only height)
+- Grafana's "Public Dashboards" feature doesn't support template variables or PostgreSQL datasources
+- Anonymous auth would be required, which exposes all dashboards
+
+Current config has `allow_embedding: false`. If revisiting this, see git history for the iframe attempt (2026-01-30).
+
+## Datasources
+
+| Name | Type | URL |
+|------|------|-----|
+| Prometheus | prometheus | `http://prometheus.monitoring.svc.cluster.local:9090` |
+| Loki | loki | `http://loki.monitoring.svc.cluster.local:3100` |
+| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` |
+
+## Dashboard Provisioning
+
+Dashboards are provisioned via ConfigMaps with label `grafana_dashboard: "1"`. The sidecar watches for these and loads them automatically.
+
+To add a dashboard:
+1. Create ConfigMap in `argocd/manifests/grafana-config/dashboards/`
+2. Add label `grafana_dashboard: "1"`
+3. Optionally add annotation `grafana_folder: "FolderName"` for organization
+4. Sync the `grafana-config` ArgoCD app
+
+## Log
+
+### 2026-01-30
+- Attempted Grafana iframe embeds for Homepage metrics panel
+- Issues: width constraints don't work, some panels fail to load
+- Reverted to authenticated-only access (no anonymous auth)
+
+### 2026-01-19 (Phase 2)
+- Migrated from Homebrew/Ansible to Kubernetes
+- Helm chart mirrored to forge
+- Exposed via Tailscale Ingress
--- a/docs/indri.md
+++ b/docs/indri.md
@ -0,0 +1,65 @@
+---
+id: indri
+aliases:
+  - indri
+  - mac-mini
+tags:
+  - blumeops
+---
+
+# Indri Maintenance Log
+
+Indri is a Mac Mini M1 (2020) serving as the primary [[1767747119-YCPO|BlumeOps]] server.
+
+## Host Details
+
+- Model: Mac mini M1, 2020 (Macmini9,1)
+- Storage: 2TB internal SSD
+- macOS: 15.7.3 (Sequoia)
+- Role: Primary server for homelab services
+
+## Passwordless Sudo
+
+Configured passwordless sudo for `erichblume` user to allow ansible `become: true` tasks to run without password prompts:
+
+```bash
+# Config at /etc/sudoers.d/erichblume
+erichblume ALL=(ALL) NOPASSWD: ALL
+```
+
+This is acceptable given the security model - tailnet access is the trust boundary.
+
+## Sleep Prevention
+
+Indri must stay awake to serve network requests. Currently using **Amphetamine** (App Store) to prevent sleep.
+
+**Configuration:**
+- Start Session At Launch: enabled
+- Default Duration: indefinite
+- Allow Closed-Display Sleep: enabled (no display attached)
+
+**Known Issue:** Amphetamine can crash after extended uptime (~12 days observed), leaving the system unprotected. If this becomes a recurring problem, consider switching to system-level sleep prevention:
+
+```bash
+# Option 1: Disable sleep via pmset (requires sudo)
+sudo pmset -c sleep 0 displaysleep 0
+
+# Option 2: Use caffeinate daemon via LaunchAgent
+# Create ~/Library/LaunchAgents/com.local.caffeinate.plist
+caffeinate -s  # -s = prevent sleep on AC power
+```
+
+These could be managed via ansible for reliability.
+
+## Log
+
+### Mon Jan 20 2026
+
+**Amphetamine crash caused overnight sleep**
+
+- Amphetamine 5.3.2 crashed at 19:08 on Jan 19 (segfault in `objc_release` during timer callback)
+- System went to sleep at 19:20, stayed asleep overnight
+- Discovered when services were unreachable; manually restarted Amphetamine at ~07:30
+- Crash report: `~/Library/Logs/DiagnosticReports/Amphetamine-2026-01-19-190921.ips`
+- Root cause: Memory management bug in Amphetamine during long-running session (~12 days uptime)
+- Action: Monitoring for now; if recurs, will implement `pmset`/`caffeinate` via ansible
--- a/docs/jellyfin.md
+++ b/docs/jellyfin.md
@ -0,0 +1,90 @@
+---
+id: jellyfin
+aliases:
+  - jellyfin
+tags:
+  - blumeops
+---
+
+# Jellyfin Management Log
+
+Jellyfin is a free, open-source media server running natively on [[indri|Indri]] for full VideoToolbox hardware transcoding support.
+
+## Service Details
+
+- URL: https://jellyfin.ops.eblu.me
+- Port: 8096 (localhost only, proxied via Caddy)
+- Data directory: `~/Library/Application Support/jellyfin`
+- Media path: `/Volumes/allisonflix` (NFS from sifaka)
+- LaunchAgent: `mcquack.jellyfin`
+
+## Useful Commands
+
+```bash
+# Check LaunchAgent status
+ssh indri 'launchctl list | grep jellyfin'
+
+# View logs
+ssh indri 'tail -f ~/Library/Logs/mcquack.jellyfin.err.log'
+
+# Check port is listening
+ssh indri 'lsof -nP -iTCP:8096 -sTCP:LISTEN'
+
+# Restart Jellyfin
+ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist && launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist'
+
+# Check metrics
+ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/jellyfin.prom'
+```
+
+## Hardware Transcoding
+
+Jellyfin uses Apple VideoToolbox for hardware-accelerated transcoding on the M1 Mac Mini.
+
+**Capabilities:**
+- H.264 encode/decode: Hardware
+- HEVC (H.265) encode/decode: Hardware
+- AV1 decode: Software only (requires M3+)
+- HDR to SDR tone mapping: VPP (hardware)
+- Concurrent 4K streams: ~3 with HDR tonemapping
+
+**Configuration** (Dashboard > Playback):
+1. Hardware Acceleration: Apple VideoToolbox
+2. Allow hardware encoding: Enabled
+3. VPP Tone mapping: Enabled (for HDR to SDR)
+
+## Observability
+
+- Metrics: Collected via `jellyfin_metrics` ansible role to Prometheus textfile
+- Logs: Forwarded to Loki via Alloy (`service="jellyfin"`)
+- Dashboard: "Jellyfin Media Server" in Grafana
+
+### Metrics collected:
+- `jellyfin_up` - Server availability
+- `jellyfin_version_info` - Server version
+- `jellyfin_library_items{library,type}` - Library counts
+- `jellyfin_sessions_total` - Active sessions
+- `jellyfin_sessions_playing` - Playing sessions
+- `jellyfin_transcode_sessions_total` - Transcoding sessions
+
+## API Key Setup
+
+Metrics collection requires an API key:
+
+1. Open https://jellyfin.ops.eblu.me
+2. Go to Dashboard > API Keys > Add
+3. Create key with description "metrics"
+4. Save to indri:
+```bash
+ssh indri 'echo "YOUR_API_KEY" > ~/.jellyfin-api-key && chmod 600 ~/.jellyfin-api-key'
+```
+
+## Log
+
+### 2026-01-30 (Initial Deployment)
+- Deployed Jellyfin natively on indri via Ansible
+- Installed via Homebrew cask, managed via LaunchAgent
+- Added Caddy routing for `jellyfin.ops.eblu.me`
+- Added metrics collection (jellyfin_metrics role)
+- Added log collection via Alloy
+- Created Grafana dashboard
--- a/docs/kiwix.md
+++ b/docs/kiwix.md
@ -0,0 +1,103 @@
+---
+id: kiwix
+aliases:
+  - kiwix
+tags:
+  - blumeops
+---
+
+# Kiwix Management Log
+
+Kiwix serves offline Wikipedia (and other ZIM archives) in Kubernetes via Tailscale at https://kiwix.tail8d86e.ts.net.
+
+## Service Details
+
+- URL: https://kiwix.tail8d86e.ts.net
+- Namespace: `kiwix`
+- Image: `ghcr.io/kiwix/kiwix-serve:3.8.1`
+- ArgoCD app: `kiwix`
+- Storage: NFS mount from sifaka (`/volume1/torrents`)
+
+## Architecture
+
+The kiwix deployment has two components:
+
+1. **kiwix-serve** - Main container serving ZIM files at port 80
+2. **torrent-sync** - Sidecar that syncs declarative ZIM torrent list to Transmission
+
+A CronJob (`zim-watcher`) runs hourly to detect new ZIM files and trigger a deployment restart when needed.
+
+## Useful Commands
+
+```bash
+# View kiwix logs
+kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c kiwix-serve
+
+# View torrent sync logs
+kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c torrent-sync
+
+# Check ZIM watcher job
+kubectl --context=minikube-indri -n kiwix get cronjob zim-watcher
+
+# Manually trigger ZIM watcher
+kubectl --context=minikube-indri -n kiwix create job --from=cronjob/zim-watcher zim-watcher-manual
+
+# Sync from ArgoCD
+argocd app sync kiwix
+```
+
+## ArgoCD Management
+
+Kiwix is deployed via ArgoCD from `argocd/manifests/kiwix/`:
+- `deployment.yaml` - Kiwix-serve + torrent-sync sidecar
+- `service.yaml` - ClusterIP service
+- `ingress-tailscale.yaml` - Tailscale Ingress
+- `configmap-zim-torrents.yaml` - Declarative list of ZIM torrents to download
+- `configmap-sync-script.yaml` - Script to sync torrents to Transmission
+- `cronjob-zim-watcher.yaml` - Hourly job to restart kiwix on new ZIMs
+
+## Adding New ZIM Archives
+
+1. Edit `argocd/manifests/kiwix/configmap-zim-torrents.yaml`
+2. Add the torrent URL from https://download.kiwix.org/zim/
+3. Sync the kiwix app: `argocd app sync kiwix`
+4. The torrent-sync sidecar will add the torrent to [[transmission|Transmission]]
+5. Once downloaded, the zim-watcher CronJob will detect it and restart kiwix
+
+## Configured Archives
+
+The declarative torrent list includes:
+- Wikipedia top 1M English articles with images
+- Project Gutenberg (60,000+ public domain books)
+- iFixit repair guides
+- Stack Exchange sites (SuperUser, Math, etc.)
+- LibreTexts textbooks (Bio, Chem, Eng, Math, Phys, Humanities)
+- DevDocs (developer documentation bundles)
+
+See `argocd/manifests/kiwix/configmap-zim-torrents.yaml` for the full list.
+
+## Storage
+
+ZIM files are stored on sifaka NAS at `/volume1/torrents/complete/`. The kiwix pod mounts this directory via NFS.
+
+**Note**: The NFS mount works because minikube uses the docker driver which NATs through indri's LAN IP, allowing direct access to sifaka.
+
+## Log
+
+### 2026-01-21 (P6)
+- **Migrated to Kubernetes** (Phase 6 of k8s migration)
+- Direct NFS mount from sifaka (no PVC, shared with transmission)
+- Torrent-sync sidecar adds configured ZIMs to Transmission
+- ZIM-watcher CronJob restarts deployment when new files appear
+- Tailscale Ingress at `kiwix.tail8d86e.ts.net`
+- Retired ansible kiwix role from indri
+
+### 2026-01-14
+- Added transmission integration for background torrent downloads
+- Enabled Gutenberg, iFixit, SuperUser, Math SE, and all LibreTexts archives
+
+### 2026-01-13
+- Added kiwix role to ansible playbook
+- Operationalized ZIM archive downloads with configurable list
+- Initial setup with kiwix-tools binary on indri
+- Managed via LaunchAgent on port 5501
--- a/docs/miniflux.md
+++ b/docs/miniflux.md
@ -0,0 +1,83 @@
+---
+id: miniflux
+aliases:
+  - miniflux
+  - feed
+  - rss
+tags:
+  - blumeops
+---
+
+# Miniflux Management Log
+
+Miniflux is a minimalist RSS/Atom feed reader running in Kubernetes (minikube on indri).
+
+## Service Details
+
+- URL: https://feed.tail8d86e.ts.net
+- Namespace: miniflux
+- Image: ghcr.io/miniflux/miniflux:latest
+- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net)
+- ArgoCD app: miniflux
+
+## Useful Commands
+
+```bash
+# View logs
+kubectl -n miniflux logs -f deployment/miniflux
+
+# Restart deployment
+kubectl -n miniflux rollout restart deployment/miniflux
+
+# Check health
+curl https://feed.tail8d86e.ts.net/healthcheck
+
+# Sync from ArgoCD
+argocd app sync miniflux
+```
+
+## ArgoCD Management
+
+Miniflux is deployed via ArgoCD from `argocd/manifests/miniflux/`:
+- `deployment.yaml` - Deployment with environment configuration
+- `service.yaml` - ClusterIP service
+- `ingress-tailscale.yaml` - Tailscale Ingress for external access
+
+## Credentials
+
+The miniflux database user password is auto-generated by CloudNativePG and stored in the `blumeops-pg-app` secret in the databases namespace.
+
+To recreate the miniflux-db secret:
+```bash
+kubectl create secret generic miniflux-db -n miniflux \
+  --from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)"
+```
+
+## Features
+
+- Keyboard shortcuts for efficient reading
+- Fever and Google Reader API compatible
+- Mobile-friendly web interface
+- OPML import/export
+- Content scraping for full articles
+
+## Backup
+
+Feed subscriptions and read state stored in [[postgresql]], backed up via borgmatic's postgresql_databases hook.
+
+## Log
+
+### Sun Jan 19 2026
+
+- **Migrated to Kubernetes** (Phase 4 of k8s migration)
+- Deployed via ArgoCD in `miniflux` namespace
+- Database connection via internal k8s DNS to CloudNativePG cluster
+- Exposed via Tailscale Ingress at feed.tail8d86e.ts.net
+- Removed brew miniflux service and ansible role from indri
+- Fixed table ownership issue after P3 restore (tables were owned by eblume, needed to be owned by miniflux)
+
+### Thu Jan 16 2026
+
+- Initial setup with Miniflux 2.x on brew
+- Connected to PostgreSQL 18 on localhost
+- Exposed via Tailscale at feed.tail8d86e.ts.net
--- a/docs/minikube.md
+++ b/docs/minikube.md
@ -0,0 +1,137 @@
+---
+id: minikube
+aliases:
+  - minikube
+  - kubernetes
+  - k8s
+tags:
+  - blumeops
+---
+
+# Minikube Management Log
+
+Minikube provides a single-node Kubernetes cluster on Indri for running containerized services.
+
+## Cluster Details
+
+- Driver: **docker** (runs as container inside Docker Desktop)
+- Container runtime: docker
+- Kubernetes version: v1.34.0
+- Resources: 6 CPUs, 11GB RAM (leaves 1GB for Docker Desktop overhead), 200GB disk
+- API server: https://k8s.tail8d86e.ts.net (Tailscale service with TCP passthrough)
+- Internal port: dynamic (currently 50820 - Docker maps random host port to container's 6443)
+
+**Prerequisites:** Docker Desktop must be installed and running with at least 12GB memory allocated.
+
+## Remote Access from Gilbert
+
+Run `mise run ensure-minikube-indri-kubectl-config` to set up kubectl access. This script:
+1. Fetches certificates from indri via SSH
+2. Creates kubeconfig at `~/.kube/minikube-indri/config.yml`
+
+**Fish abbreviations** (in `~/.config/fish/config.fish`):
+- `ki` -> `kubectl --context=minikube-indri`
+- `k9i` -> `k9s --context=minikube-indri`
+- `k9` -> `k9s`
+
+```bash
+# Quick access via abbreviations
+ki get nodes
+k9i
+
+# Or explicitly set context
+kubectl config use-context minikube-indri
+kubectl get nodes
+```
+
+## Volume Mounting (for P6 kiwix/transmission)
+
+**Direct NFS from pods to sifaka** - tested and working.
+
+Docker NATs outbound traffic through indri's LAN IP (192.168.1.50). Sifaka's NFS exports allow:
+- `192.168.1.0/24` - Docker containers via indri NAT
+- `100.64.0.0/10` - Tailscale clients
+
+Pods mount NFS directly:
+```yaml
+volumes:
+  - name: torrents
+    nfs:
+      server: sifaka
+      path: /volume1/torrents
+```
+
+No LaunchAgents, no `minikube mount`, no hostPath complexity needed.
+
+## Useful Commands (on indri)
+
+```bash
+# Cluster status
+minikube status
+
+# Start/stop cluster
+minikube start
+minikube stop
+
+# Access dashboard
+minikube dashboard
+
+# SSH into node
+minikube ssh
+
+# View logs
+minikube logs
+
+# Get API server URL (shows current port)
+kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
+```
+
+## Registry Mirror (Zot)
+
+Containerd is configured to use [[zot]] on indri as a pull-through cache for container images. This is managed by the ansible `minikube` role.
+
+Config location: `/etc/containerd/certs.d/<registry>/hosts.toml` (inside minikube container)
+
+With docker driver, uses `host.minikube.internal:5050` to reach zot on the host.
+
+Mirrors configured for:
+- `registry.ops.eblu.me` (private images)
+- `docker.io`
+- `ghcr.io`
+- `quay.io`
+
+To verify the mirror is working:
+```bash
+# Check zot's cached images
+curl -s http://localhost:5050/v2/_catalog | jq
+```
+
+## Log
+
+### 2026-01-21 (Docker Driver Migration)
+- **Migrated from qemu2 to docker driver** (Phase 5.1)
+- qemu2 had Tailscale TCP proxy issue (TLS handshake timeout to VM IP)
+- docker driver puts API server on localhost, which Tailscale serve handles correctly
+- Removed socket_vmnet, qemu dependencies
+- Removed NFS/minikube-mount LaunchAgents (will re-add NFS for P6 with simpler hostPath approach)
+- API server port is now dynamic (Docker assigns random host port)
+- Ansible role updated to query port and configure tailscale serve accordingly
+- Created `mise run ensure-minikube-indri-kubectl-config` for workstation setup
+
+### 2026-01-21 (QEMU2 Migration - superseded)
+- Migrated from podman to qemu2 driver
+- Podman driver had fundamental limitations preventing volume mounts
+- qemu2 created actual VM with full kernel capabilities
+- Volume mounting solution: NFS on host + minikube mount passthrough
+- **Issue discovered:** Tailscale TCP proxy to VM IP (192.168.105.2:6443) fails with TLS timeout
+
+### 2026-01-19
+- Configured CRI-O registry mirror to use zot as pull-through cache
+- Added ansible automation to apply mirror config on provisioning
+- Fixed ansible hanging: `minikube ssh` with piped stdin requires `--native-ssh=false`
+
+### 2026-01-18
+- Initial cluster setup for k8s migration Phase 0
+- Configured for remote access with --apiserver-names=indri
+- 1Password credential integration for kubectl from gilbert
+- Exposed as Tailscale service `k8s.tail8d86e.ts.net` with TCP passthrough
--- a/docs/navidrome.md
+++ b/docs/navidrome.md
@ -0,0 +1,80 @@
+---
+id: navidrome
+aliases:
+  - DJ
+tags:
+  - blumeops
+  - service
+---
+
+Navidrome is a self-hosted music streaming server deployed on [[blumeops|BlumeOps]].
+
+# Access
+
+- **Primary URL**: https://dj.ops.eblu.me (via Caddy)
+- **Tailscale URL**: https://dj.tail8d86e.ts.net
+
+# Deployment
+
+Navidrome runs in Kubernetes (minikube on [[indri]]) and is managed via [[argocd|ArgoCD]].
+
+**Manifests**: `argocd/manifests/navidrome/`
+
+## Storage
+
+| Mount   | Type              | Source                  | Access     |
+|---------|-------------------|-------------------------|------------|
+| /music  | NFS PV            | sifaka:/volume1/music   | Read-only  |
+| /data   | Local PVC (10Gi)  | minikube storage class  | Read-write |
+
+The `/data` directory contains:
+- SQLite database
+- Configuration
+- Cache files
+
+## Configuration
+
+Environment variables set in deployment:
+- `ND_SCANSCHEDULE=1h` - Rescan library every hour
+- `ND_LOGLEVEL=info` - Standard logging level
+- `ND_MUSICFOLDER=/music` - Music library path
+- `ND_DATAFOLDER=/data` - Data directory path
+
+## Initial Setup
+
+On first access, Navidrome will prompt to create an admin user. No default credentials.
+
+# Operations
+
+## Sync Application
+
+```bash
+argocd app sync navidrome
+```
+
+## Check Status
+
+```bash
+argocd app get navidrome
+kubectl --context=minikube-indri -n navidrome get pods
+kubectl --context=minikube-indri -n navidrome logs deploy/navidrome
+```
+
+## Verify NFS Mount
+
+```bash
+kubectl --context=minikube-indri -n navidrome exec deploy/navidrome -- ls /music
+```
+
+## Force Library Rescan
+
+Access Settings > Library in the web UI, or trigger via API:
+```bash
+curl -X POST https://dj.ops.eblu.me/api/library/scan -H "x-nd-authorization: Bearer <token>"
+```
+
+# Related
+
+- [[jellyfin]] - Video streaming (runs on indri directly)
+- [[argocd]] - GitOps deployment
+- [[blumeops]] - Infrastructure overview
--- a/docs/postgresql.md
+++ b/docs/postgresql.md
@ -0,0 +1,131 @@
+---
+id: postgresql
+aliases:
+  - postgresql
+  - postgres
+  - pg
+tags:
+  - blumeops
+---
+
+# PostgreSQL Management Log
+
+PostgreSQL database cluster running in Kubernetes (minikube on indri) via CloudNativePG operator, providing storage for [[miniflux]] and other services.
+
+## Quick Connect
+
+```bash
+# Connect as superuser (fetches password from 1Password)
+PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -d miniflux
+```
+
+## Service Details
+
+- URL: tcp://pg.tail8d86e.ts.net:5432
+- Metrics: http://cnpg-metrics.tail8d86e.ts.net:9187/metrics
+- Namespace: databases
+- Cluster name: blumeops-pg
+- Operator: CloudNativePG
+- ArgoCD app: blumeops-pg
+
+## Databases
+
+| Database | Owner    | Purpose                    |
+|----------|----------|----------------------------|
+| miniflux | miniflux | Miniflux feed reader data  |
+
+## Users
+
+| User      | Role             | Purpose                |
+|-----------|------------------|------------------------|
+| postgres  | superuser        | CNPG internal          |
+| miniflux  | app owner        | Owns miniflux database |
+| eblume    | superuser        | Admin access           |
+| borgmatic | pg_read_all_data | Backup access          |
+
+## Useful Commands
+
+```bash
+# List databases
+PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\l"
+
+# List users
+PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\du"
+
+# View CNPG cluster status
+kubectl -n databases get cluster blumeops-pg
+
+# View pod logs
+kubectl -n databases logs -f blumeops-pg-1
+```
+
+## Backup
+
+PostgreSQL data is backed up via borgmatic from indri using the `postgresql_databases` hook, which streams pg_dump directly to Borg for consistent backups.
+
+Borgmatic config (`~/.config/borgmatic/config.yaml`):
+```yaml
+postgresql_databases:
+    - name: miniflux
+      hostname: pg.tail8d86e.ts.net
+      port: 5432
+      username: borgmatic
+```
+
+Password is read from `~/.pgpass` (managed by borgmatic ansible role).
+
+## ArgoCD Management
+
+```bash
+# Sync cluster changes
+argocd app sync blumeops-pg
+
+# Force reconcile
+kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
+```
+
+**Files:**
+- Cluster spec: `argocd/manifests/databases/blumeops-pg.yaml`
+- Tailscale service: `argocd/manifests/databases/service-tailscale.yaml`
+- Secrets: `secret-eblume.yaml.tpl`, `secret-borgmatic.yaml.tpl` (via `op inject`)
+
+## Credentials
+
+**1Password items:**
+- `guxu3j7ajhjyey6xxl2ovsl2ui` - eblume superuser password
+- `mw2bv5we7woicjza7hc6s44yvy` - borgmatic user password
+
+**CNPG-managed secrets:**
+- `blumeops-pg-app` - miniflux user (auto-generated password)
+- `blumeops-pg-eblume` - eblume superuser
+- `blumeops-pg-borgmatic` - borgmatic backup user
+
+## Log
+
+### Wed Jan 22 2026
+
+- Added CNPG metrics collection via Tailscale service at `cnpg-metrics.tail8d86e.ts.net:9187`
+- Updated PostgreSQL Grafana dashboard to use CNPG metric names (`cnpg_*` prefix)
+- Prometheus on indri now scrapes CNPG metrics directly
+
+### Sun Jan 19 2026 (P4)
+
+- **Retired brew PostgreSQL** - k8s CloudNativePG is now the only PostgreSQL
+- Renamed Tailscale hostname from `k8s-pg` to `pg` (canonical)
+- Removed postgresql ansible role from indri
+- Moved .pgpass management to borgmatic role
+- Updated borgmatic to backup only `pg.tail8d86e.ts.net`
+- Fixed table ownership issue: P3 restore created tables owned by eblume, transferred to miniflux
+
+### Sun Jan 19 2026 (P3)
+
+- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
+- Added borgmatic user to k8s-pg via CloudNativePG managed roles
+- Both brew and k8s PostgreSQL backed up by borgmatic during migration
+- Added Tailscale ACL: `tag:homelab` → `tag:k8s` on port 5432 for backup access
+
+### Thu Jan 16 2026
+
+- Initial setup with PostgreSQL 18 (brew)
+- Created miniflux database and user
+- Exposed via Tailscale at pg.tail8d86e.ts.net
--- a/docs/pulumi.md
+++ b/docs/pulumi.md
@ -0,0 +1,73 @@
+---
+id: pulumi
+aliases:
+  - pulumi
+  - tailnet-iac
+tags:
+  - blumeops
+---
+
+# Pulumi Tailnet IaC Management Log
+
+Pulumi manages the tail8d86e.ts.net tailnet configuration, including ACLs, tags, and DNS settings.
+
+## Architecture
+
+Two-layer approach:
+- **Layer 1 (Pulumi)**: Tailnet-wide config - ACLs, tags, DNS (this card)
+- **Layer 2 (Ansible)**: Node-local `tailscale serve` config - see `tailscale_serve` role
+
+## Service Details
+
+- State backend: Pulumi Cloud (https://app.pulumi.com/eblume/blumeops-tailnet)
+- Stack: `tail8d86e`
+- Config directory: `pulumi/` in blumeops repo
+- Policy file: `pulumi/policy.hujson` (HuJSON with comments)
+
+## Authentication
+
+Uses OAuth client stored in 1Password (blumeops vault):
+- Client configured with scopes: acl, dns, devices, services
+- Auto-applies `tag:blumeops` to IaC-managed resources
+
+## Useful Commands
+
+```bash
+# Preview changes
+mise run tailnet-preview
+
+# Apply changes
+mise run tailnet-up
+
+# View current state
+mise run tailnet-preview
+
+# Pass additional args
+mise run tailnet-up -- --yes
+```
+
+## Making ACL Changes
+
+1. Edit `pulumi/policy.hujson` in the blumeops repo
+2. Run `mise run tailnet-preview` to see what will change
+3. Run `mise run tailnet-up` to apply
+4. Commit and push
+
+## What's Managed
+
+Currently managed by Pulumi:
+- ACL policy (`tailscale:index:Acl`)
+
+Can be added later:
+- DNS nameservers (`tailscale:index:DnsNameservers`)
+- DNS search paths (`tailscale:index:DnsSearchPaths`)
+- Tailnet settings (`tailscale:index:TailnetSettings`)
+
+## Log
+
+### Wed Jan 15 2026
+
+- Initial setup with Pulumi + Python
+- Imported existing ACL from Tailscale
+- State stored in Pulumi Cloud (free tier)
+- OAuth authentication via 1Password
--- a/docs/teslamate.md
+++ b/docs/teslamate.md
@ -0,0 +1,113 @@
+---
+id: teslamate
+aliases:
+  - teslamate
+  - tesla
+tags:
+  - blumeops
+---
+
+# TeslaMate
+
+TeslaMate is a self-hosted Tesla data logger running in Kubernetes (minikube on indri), collecting and visualizing vehicle data from the Tesla Owner API.
+
+## Service Details
+
+- URL: https://tesla.tail8d86e.ts.net
+- Namespace: `teslamate`
+- Image: `teslamate/teslamate:2.2.0`
+- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net)
+- ArgoCD app: `teslamate`
+
+## What TeslaMate Collects
+
+- Battery level, state of charge, range estimates
+- Charging sessions (location, energy, cost, duration)
+- Drives (distance, efficiency, routes)
+- Climate/HVAC usage
+- Software update history
+- Vampire drain analysis
+- Vehicle states (asleep, driving, charging, online)
+
+## Grafana Dashboards
+
+18 dashboards available in Grafana under the "TeslaMate" folder at https://grafana.tail8d86e.ts.net:
+
+- Overview, Charges, Drives, Efficiency, States
+- Battery Health, Vampire Drain, Statistics
+- Charge Level, Locations, Trip, Mileage
+- Drive Stats, Charging Stats, Projected Range
+- Timeline, Updates, Visited
+
+Dashboards use the `TeslaMate` PostgreSQL datasource (not Prometheus).
+
+## Useful Commands
+
+```bash
+# View logs
+kubectl --context=minikube-indri -n teslamate logs -f deployment/teslamate
+
+# Check pod status
+kubectl --context=minikube-indri -n teslamate get pods
+
+# Restart deployment
+kubectl --context=minikube-indri -n teslamate rollout restart deployment/teslamate
+
+# Sync from ArgoCD
+argocd app sync teslamate
+```
+
+## Credentials
+
+**1Password items (blumeops vault):**
+- `TeslaMate` - contains `db_password` and `api_enc_key` fields
+
+**Kubernetes secrets:**
+- `teslamate-db` (teslamate ns) - DATABASE_PASS for PostgreSQL connection
+- `teslamate-encryption` (teslamate ns) - ENCRYPTION_KEY for token encryption
+- `blumeops-pg-teslamate` (databases ns) - CloudNativePG managed role password
+- `grafana-teslamate-datasource` (monitoring ns) - Grafana datasource password
+
+## Backup
+
+TeslaMate data is backed up via [[borgmatic]]:
+- PostgreSQL database `teslamate` included in `borgmatic_postgresql_databases`
+- Backed up alongside miniflux to sifaka NAS
+
+## Tesla API Authentication
+
+TeslaMate uses Tesla's Owner API (not Fleet API) via OAuth:
+
+1. Access https://tesla.tail8d86e.ts.net
+2. Click "Sign in with Tesla"
+3. Complete OAuth flow in browser
+4. Tokens are encrypted with ENCRYPTION_KEY and stored in database
+5. TeslaMate automatically refreshes tokens as needed
+
+**Standalone OAuth tool:** If you need to manually obtain tokens, there's a Rust-based helper:
+- Mirror: https://forge.tail8d86e.ts.net/eblume/tesla_auth.git
+- Runs OAuth flow and outputs access/refresh tokens
+
+## Database Notes
+
+- TeslaMate requires PostgreSQL 17.3+ or 18.x
+- The `teslamate` user has superuser privileges (required for extension management during migrations)
+- Extensions used: `cube`, `earthdistance` (for geospatial calculations)
+
+## Related
+
+- [[1767747119-YCPO|BlumeOps]]
+- [[argocd|ArgoCD]]
+- [[postgresql|PostgreSQL]]
+- [[borgmatic|Borgmatic]]
+
+## Log
+
+### Thu Jan 23 2026
+
+- Initial deployment to Kubernetes
+- 18 Grafana dashboards imported from TeslaMate project
+- Upgraded CloudNativePG 1.25 -> 1.28 for major version upgrade support
+- Upgraded PostgreSQL 17.2 -> 18.1 (required for TeslaMate 2.2.0)
+- Tailscale Ingress at `tesla.tail8d86e.ts.net`
+- Backup configuration added to borgmatic
--- a/docs/transmission.md
+++ b/docs/transmission.md
@ -0,0 +1,100 @@
+---
+id: transmission
+aliases:
+  - transmission
+tags:
+  - blumeops
+---
+
+# Transmission Management Log
+
+Transmission is a BitTorrent daemon running in Kubernetes, primarily used to download large ZIM archives for [[kiwix|Kiwix]].
+
+## Service Details
+
+- URL: https://torrent.tail8d86e.ts.net
+- Namespace: `torrent`
+- Image: `lscr.io/linuxserver/transmission:latest`
+- ArgoCD app: `torrent`
+- Storage: NFS PVC from sifaka (`/volume1/torrents`)
+
+## Useful Commands
+
+```bash
+# View transmission logs
+kubectl --context=minikube-indri -n torrent logs -f deployment/transmission
+
+# Check RPC connectivity (from another pod)
+kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \
+  curl -s http://transmission.torrent.svc.cluster.local:9091/transmission/rpc
+
+# Sync from ArgoCD
+argocd app sync torrent
+```
+
+## ArgoCD Management
+
+Transmission is deployed via ArgoCD from `argocd/manifests/torrent/`:
+- `deployment.yaml` - Transmission container with NFS volume
+- `service.yaml` - ClusterIP service (port 9091)
+- `ingress-tailscale.yaml` - Tailscale Ingress for web UI
+- `pv-nfs.yaml` - NFS PersistentVolume
+- `pvc.yaml` - PersistentVolumeClaim
+
+## Storage Layout
+
+The NFS share on sifaka (`/volume1/torrents`) has this structure:
+- `/downloads/` - Active downloads and torrent metadata
+- `/downloads/complete/` - Completed downloads
+- `/config/` - Transmission configuration
+- `/watch/` - Watch directory for .torrent files
+
+Kiwix reads from `/downloads/complete/` to serve ZIM archives.
+
+## Integration with Kiwix
+
+The [[kiwix]] deployment includes a torrent-sync sidecar that:
+1. Reads the declarative ZIM torrent list from a ConfigMap
+2. Adds missing torrents to Transmission via RPC
+3. Runs on startup and every 30 minutes
+
+When downloads complete:
+1. Transmission moves files to `/downloads/complete/`
+2. The zim-watcher CronJob (in kiwix namespace) detects new ZIMs
+3. Kiwix deployment is restarted to pick up new archives
+
+## Monitoring
+
+**TODO:** Write custom transmission exporter. Existing exporters (`metalmatze/transmission-exporter`, `sandrotosi/simple_transmission_exporter`) are incompatible with Transmission 4's changed JSON API (type mismatches in `lastScrapeTimedOut` field).
+
+Current monitoring via web UI at https://torrent.tail8d86e.ts.net:
+- Active/seeding/paused torrent counts
+- Upload/download speeds
+- Disk usage
+
+Basic uptime monitoring via blackbox probe in [[alloy|Alloy k8s]] (see Services Health dashboard).
+
+## Log
+
+### 2026-01-22
+
+- Attempted to add `metalmatze/transmission-exporter` sidecar for Prometheus metrics
+- Exporter failed with JSON parsing errors - incompatible with Transmission 4 API changes
+- Removed exporter sidecar, dashboard, and Prometheus scrape config
+- Added basic HTTP probe via Alloy k8s blackbox exporter instead
+- Deleted stale `transmission.prom` textfile from indri
+
+### 2026-01-21 (P6)
+- **Migrated to Kubernetes** (Phase 6 of k8s migration)
+- NFS PersistentVolume for storage on sifaka
+- Tailscale Ingress at `torrent.tail8d86e.ts.net`
+- RPC accessible to kiwix namespace for torrent sync
+- Moved existing ZIM files to `/downloads/complete/` for seeding
+- Retired ansible transmission role from indri
+
+### 2026-01-14
+- Added transmission role to ansible playbook
+- Integrated with kiwix role for torrent-based ZIM downloads
+- Initial setup with transmission-cli via homebrew
+- Managed via brew services on port 9091
+- Metrics collected via textfile exporter
--- a/docs/zot.md
+++ b/docs/zot.md
@ -0,0 +1,112 @@
+---
+id: zot
+aliases:
+  - zot
+  - container-registry
+tags:
+  - blumeops
+---
+
+# Zot Registry Management Log
+
+Zot is an OCI-native container registry running on Indri, providing:
+1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
+2. Private image storage for custom-built containers
+
+## Service Details
+
+- URL: https://registry.ops.eblu.me
+- Local port: 5050
+- Data directory: ~/zot
+- Config: ~/.config/zot/config.json
+- Managed via: mcquack LaunchAgent
+
+## Namespace Convention
+
+| Path | Source |
+|------|--------|
+| `registry.../docker.io/*` | Cached from Docker Hub |
+| `registry.../ghcr.io/*` | Cached from GHCR |
+| `registry.../quay.io/*` | Cached from Quay |
+| `registry.../blumeops/*` | Private images (yours) |
+
+## How It Works
+
+### Pull-Through Cache (Automatic)
+
+When [[minikube]] pulls an image like `docker.io/library/nginx:latest`:
+1. Containerd checks zot first (via `host.minikube.internal:5050`)
+2. If zot has it cached, returns immediately
+3. If not, zot fetches from upstream, caches it, returns to k8s
+
+Cached images appear under their original registry path (e.g., `docker.io/library/nginx`).
+
+### Private Images (Manual Push)
+
+Build and push from gilbert using podman:
+```bash
+# Build
+podman build -t registry.ops.eblu.me/blumeops/myapp:v1 .
+
+# Push to zot
+podman push registry.ops.eblu.me/blumeops/myapp:v1
+
+# Use in k8s manifest
+image: registry.ops.eblu.me/blumeops/myapp:v1
+```
+
+Private images go under `blumeops/*` namespace. Example: the devpi container is at `registry.ops.eblu.me/blumeops/devpi:latest`.
+
+### Security Model
+
+**Network access only** - no authentication configured. Anyone who can reach zot via Tailscale ACL can push/pull any image. Defense is the tailnet boundary.
+
+Zot supports htpasswd/LDAP/OIDC auth if needed in the future.
+
+## Minikube Integration
+
+The [[minikube]] cluster uses zot as a registry mirror via containerd configuration. Managed by the ansible `minikube` role.
+
+From inside minikube, zot is at `host.minikube.internal:5050`. Containerd tries the mirror first, falls back to upstream if not cached.
+
+Mirrors configured for: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io`
+
+## Useful Commands
+
+```bash
+# List all cached/pushed images
+curl -s http://indri:5050/v2/_catalog | jq
+
+# List tags for an image
+curl -s http://indri:5050/v2/blumeops/devpi/tags/list | jq
+
+# Check service status
+ssh indri 'launchctl list | grep zot'
+
+# View logs
+ssh indri 'tail -f ~/Library/Logs/mcquack.zot.err.log'
+```
+
+## Log
+
+### 2026-01-25
+- **Migrated from Tailscale serve to Caddy** - now accessible at `registry.ops.eblu.me`
+- Retired `tailscale_serve` ansible role (no longer needed)
+- Updated minikube containerd config to use new URL
+- Updated CI workflows and mise tasks
+- Old URL (`registry.tail8d86e.ts.net`) deprecated
+
+### 2026-01-21
+- Verified full workflow: podman build on gilbert → push to zot → k8s pull
+- Documented security model (network-only auth via Tailscale ACL)
+- Updated minikube integration: now uses containerd (docker driver) instead of CRI-O (podman driver)
+- Mirror endpoint changed from `host.containers.internal:5050` to `host.minikube.internal:5050`
+
+### 2026-01-19
+- Integrated with minikube as CRI-O registry mirror
+- All k8s image pulls now go through zot cache automatically
+
+### 2026-01-18
+- Initial setup for k8s migration Phase 0
+- Configured pull-through cache for Docker Hub, GHCR, Quay
+- Exposed via Tailscale service at registry.tail8d86e.ts.net
--- a/mise-tasks/zk-docs
+++ b/mise-tasks/zk-docs
@ -3,11 +3,12 @@

 set -euo pipefail

-ZK_DIR="$HOME/code/personal/zk"
-MAIN_CARD="$ZK_DIR/1767747119-YCPO.md"
+# Blumeops docs now live in the repo itself (symlinked into zk)
+DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs"
+MAIN_CARD="$DOCS_DIR/1767747119-YCPO.md"

 # Find all files tagged with blumeops (excluding main card)
-other_cards=$(grep -l '^  - blumeops$' "$ZK_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort)
+other_cards=$(grep -l '^  - blumeops$' "$DOCS_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort)

 # Concatenate: main card first, then others
 # Pass through any args to bat (e.g., --style=header --color=never --decorations=always)