Add docs/ directory with blumeops zk cards

Move 21 blumeops-tagged zettelkasten cards from ~/code/personal/zk/
to docs/ in this repository. These files are symlinked back into the
zk at ~/code/personal/zk/blumeops for seamless obsidian.nvim integration.

This enables:
- Git-managed documentation in the blumeops repo
- Preserved wiki links between blumeops docs
- obsidian-sync isolation (docs don't sync to other devices)
- Direct editing via obsidian.nvim with the blumeops workspace

Also updates zk-docs mise task to read from local docs/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-02-02 19:09:19 -08:00
commit a7d771d945
22 changed files with 2416 additions and 3 deletions

247
docs/1767747119-YCPO.md Normal file
View file

@ -0,0 +1,247 @@
---
id: 1767747119-YCPO
aliases:
- blumeops
- BlumeOps
tags:
- blumeops
---
BlumeOps, aka Blue Mops, refers to my own personal computing operations stack.
Source code: https://forge.ops.eblu.me/eblume/blumeops (mirrored to https://github.com/eblume/blumeops)
# Infrastructure
| Host | Description | Notes |
|----------------------------------|--------------------------|----------------------------------------------------|
| **[[indri|Indri]]** | Mac Mini M1, 2020 | Primary server, 2TB internal disk |
| **[Sifaka](https://nas.ops.eblu.me)** | Synology NAS | 10.9TB RAID 5, backup target |
| **Gilbert** | 13" MacBook Air M4, 2025 | Primary workstation |
| **Mouse** | 13" MacBook Air M2 | Allison's laptop |
| **[UniFi](https://192.168.1.1)** | UniFi Express 7 | Home WiFi network ([cloud](https://unifi.ui.com)) |
| **Dwarf** | iPad Air | Employer-provided, off tailnet |
All devices are connected via [Tailscale](https://login.tailscale.com/) tailnet `tail8d86e.ts.net`.
## Tailscale Access Control
ACLs are managed via Pulumi in `pulumi/policy.hujson`. See [[pulumi]] for deployment commands.
**Important lesson learned:**
- Don't tag user-owned devices (like gilbert) - tagging converts them to "tagged devices" which lose user identity and break user-based SSH rules
### Groups
| Group | Members | Purpose |
|---------------------|--------------------------------------------|----------------------------------|
| `group:allisonflix` | blume.erich@gmail.com, acmdavis@gmail.com | Jellyfin media access |
### Device Tags
| Tag | Devices | Purpose |
|------------------|-------------|--------------------------------------------|
| `tag:homelab` | indri | Server infrastructure |
| `tag:nas` | sifaka | Network-attached storage for backups |
| `tag:blumeops` | indri, sifaka | Resources managed by Pulumi IaC |
| `tag:registry` | indri | Container registry access |
| `tag:k8s-api` | indri | Kubernetes API server access |
### Access Matrix
| Source | Kiwix | Forge | PyPI | Miniflux | PostgreSQL | NAS | Grafana | Loki |
|--------------------------|-------|-------|------|----------|------------|-----|---------|------|
| `autogroup:admin` | Y | Y | Y | Y | Y | Y | Y | Y |
| `autogroup:member` | Y | Y | Y | Y | Y | - | - | - |
| `tag:homelab` | - | - | - | - | - | Y | - | - |
Notes:
- **Admins** - full access to all services via `autogroup:admin`
- **Allison** (`acmdavis@gmail.com`) - member services only, no Grafana/Loki/NAS
### SSH Access
| Source | Destinations | Auth |
|-------------------------|-----------------|-------------|
| `autogroup:member` | `autogroup:self`| check |
| `autogroup:admin` | `tag:homelab` | check (12h) |
| `autogroup:admin` | `tag:nas` | check (12h) |
# Services
Services are accessible via two DNS domains:
- **`*.ops.eblu.me`** - Caddy reverse proxy (reachable from k8s pods, docker containers, and tailnet)
- **`*.tail8d86e.ts.net`** - Tailscale MagicDNS (tailnet clients only, not from k8s/docker)
## Caddy Services (`*.ops.eblu.me`)
Caddy proxies to k8s services via their Tailscale endpoints (traffic stays local on indri).
Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs work - use ops.eblu.me for access from pods/containers.
| Service | URL | Description | Management Log |
|----------------|-----------------------------------|------------------------------------|-----------------|
| **Homepage** | https://go.ops.eblu.me | Service dashboard / start page | — |
| **Forgejo** | https://forge.ops.eblu.me | Git hosting (SSH: port 2222) | [[forgejo]] |
| **Registry** | https://registry.ops.eblu.me | OCI container registry (Zot) | [[zot]] |
| **Sifaka NAS** | https://nas.ops.eblu.me | Synology NAS dashboard | — |
| **Grafana** | https://grafana.ops.eblu.me | Dashboards & observability (k8s) | [[grafana]] |
| **ArgoCD** | https://argocd.ops.eblu.me | GitOps continuous delivery (k8s) | [[argocd]] |
| **Prometheus** | https://prometheus.ops.eblu.me | Metrics collection (k8s) | [[prometheus]] |
| **Loki** | https://loki.ops.eblu.me | Log aggregation (k8s) | [[loki]] |
| **Miniflux** | https://feed.ops.eblu.me | RSS/Atom feed reader (k8s) | [[miniflux]] |
| **PyPI** | https://pypi.ops.eblu.me | PyPI caching proxy (devpi, k8s) | [[pypi]] |
| **Kiwix** | https://kiwix.ops.eblu.me | Offline Wikipedia & ZIM (k8s) | [[argocd]] |
| **Torrent** | https://torrent.ops.eblu.me | BitTorrent daemon web UI (k8s) | [[argocd]] |
| **TeslaMate** | https://tesla.ops.eblu.me | Tesla data logger (k8s) | [[teslamate]] |
| **Immich** | https://photos.ops.eblu.me | Photo management (k8s Helm, CNPG) | [[argocd]] |
| **DJ** | https://dj.ops.eblu.me | Music streaming server (Navidrome) | [[navidrome]] |
| **PostgreSQL** | pg.ops.eblu.me:5432 | Database server (k8s CloudNativePG)| [[postgresql]] |
## Tailscale-Only Services (`*.tail8d86e.ts.net`)
These services are only accessible via Tailscale (not from k8s pods/containers):
| Service | URL | Description | Management Log |
|----------------|-----------------------------------|------------------------------------|-----------------|
| **Kubernetes** | https://k8s.tail8d86e.ts.net | Minikube API (TCP passthrough) | [[minikube]] |
| **Jellyfin** | https://jellyfin.ops.eblu.me | Media server (VideoToolbox HW) | [[jellyfin]] |
Supporting services (not directly user-facing):
| Service | Description | Management Log |
|---------------------|---------------------------------------|------------------|
| **Alloy (indri)** | Metrics & logs collector (indri host) | [[alloy]] |
| **Alloy (k8s)** | Pod log collection & service probes | [[alloy]] |
| **Kube-state-metrics** | K8s resource metrics (pods, deployments) | [[prometheus]] |
| **Borgmatic** | Daily backups to Sifaka NAS (2:00 AM) | [[borgmatic]] |
## Port Map (Indri)
| Port | Service | Protocol | Binding | Notes |
|-------|---------------|----------|-------------|--------------------------------------------|
| 443 | Caddy | HTTPS | 0.0.0.0 | Reverse proxy for `*.ops.eblu.me` |
| 2222 | Caddy L4 | TCP | 0.0.0.0 | SSH proxy → Forgejo (localhost:2200) |
| 5432 | Caddy L4 | TCP | 0.0.0.0 | PostgreSQL proxy → k8s pg |
| 2200 | Forgejo SSH | TCP | localhost | Built-in SSH server |
| 3001 | Forgejo | HTTP | localhost | Web UI (proxied by Caddy) |
| 5050 | Zot | HTTP | localhost | Registry API (proxied by Caddy) |
| 8096 | Jellyfin | HTTP | localhost | Media server (proxied by Caddy) |
| 44491 | K8s API | HTTPS | 0.0.0.0 | Minikube API server (via Tailscale k8s.*) |
# Service Management
## Pulumi (Tailnet IaC)
Tailnet-wide configuration (ACLs, tags, DNS) is managed via Pulumi. See [[pulumi]] for details.
```bash
mise run tailnet-preview # preview ACL changes
mise run tailnet-up # apply ACL changes
```
Edit `pulumi/policy.hujson` to modify ACLs or add new tags.
## Ansible
Services on Indri are managed via ansible. Playbooks live in the `ansible/` directory of the blumeops repo:
```bash
mise run provision-indri # runs ansible/playbooks/indri.yml
mise run indri-services-check # checks health of all services
```
Run with `--check --diff` first to preview changes, or target specific services:
```bash
mise run provision-indri -- --check --diff # dry run
mise run provision-indri -- --tags alloy # only alloy
mise run provision-indri -- --tags zot,borgmatic # multiple tags
```
## Adding a New Service
### Indri Services (via Caddy)
For services running directly on indri that need to be accessible from k8s pods:
1. Host service locally on localhost (e.g., localhost:3000)
2. Add service to `ansible/roles/caddy/defaults/main.yml` under `caddy_services`
3. Run `mise run provision-indri -- --tags caddy`
4. Add backup entry in borgmatic role if needed
DNS is handled by a wildcard record (`*.ops.eblu.me` → indri's Tailscale IP) managed via Pulumi in `pulumi/gandi/`.
Access via `https://foo.ops.eblu.me`.
### K8s Services (via Tailscale Ingress)
For services running in minikube:
1. Create Kubernetes manifests in `argocd/manifests/<service>/`
2. Add ArgoCD Application in `argocd/apps/<service>.yaml`
3. Add Tailscale Ingress annotation for `*.tail8d86e.ts.net` hostname
4. Add Homepage annotations to the Ingress for dashboard discovery (see below)
5. Add Caddy proxy entry in `ansible/roles/caddy/defaults/main.yml`
6. Sync via ArgoCD: `argocd app sync <service>`
Access via `https://foo.ops.eblu.me` (preferred) or `https://foo.tail8d86e.ts.net`.
**Note:** K8s services using Tailscale Ingress are NOT accessible from other k8s pods or docker containers. Use Caddy (`*.ops.eblu.me`) if pod-to-service communication is needed.
**Homepage annotations** for automatic dashboard discovery:
```yaml
annotations:
gethomepage.dev/enabled: "true"
gethomepage.dev/name: "My Service"
gethomepage.dev/group: "Apps"
gethomepage.dev/icon: "myservice.png"
gethomepage.dev/description: "Short description"
gethomepage.dev/href: "https://myservice.ops.eblu.me"
gethomepage.dev/pod-selector: "app=myservice"
```
Icons use [Dashboard Icons](https://github.com/walkxcode/dashboard-icons) format (e.g., `grafana.png`, `prometheus.png`). The `pod-selector` annotation enables pod status badges on the dashboard.
## Secrets Management
Kubernetes secrets are managed via [[external-secrets|External Secrets Operator]], which syncs from 1Password via 1Password Connect.
To add a secret to a k8s service:
1. Ensure the 1Password item exists in the `blumeops` vault
2. Create an `ExternalSecret` manifest in the service's directory
3. Reference the `onepassword-blumeops` ClusterSecretStore
4. Sync via ArgoCD
See [[external-secrets]] for detailed usage and bootstrap instructions.
# Notes
## Go DNS Resolution on macOS
**Important lesson learned (2026-01-22):**
Go programs built with `CGO_ENABLED=0` (pure Go) use a DNS resolver that reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files. This breaks Tailscale MagicDNS resolution.
**Solution:** Build Go programs with `CGO_ENABLED=1` to use the macOS native resolver. This is why [[alloy|Grafana Alloy]] is built from source rather than using the Homebrew bottle.
## Remote Kubernetes Access (from Gilbert)
The minikube cluster on indri is accessible from gilbert via Tailscale service.
Cluster was created with `--apiserver-names=k8s.tail8d86e.ts.net,indri --listen-address=0.0.0.0`.
API server exposed at `https://k8s.tail8d86e.ts.net` via TCP passthrough (preserves mTLS).
**Fish abbreviations** (in `~/.config/fish/config.fish`):
- `ki` -> `kubectl --context=minikube-indri`
- `k9i` -> `k9s --context=minikube-indri`
- `k9` -> `k9s`
```bash
# Quick access via abbreviations
ki get nodes
k9i
# Or explicitly set context
kubectl config use-context minikube-indri
kubectl get nodes
```
Credentials are stored in 1Password and fetched via exec credential plugin. See [[minikube]] for details.

136
docs/1768246525-RVRY.md Normal file
View file

@ -0,0 +1,136 @@
---
id: 1768246525-RVRY
aliases:
- forgejo
- forge
tags:
- blumeops
- forgejo
- git
- scm
- forge
---
# Mon Jan 12 11:35
```fish
brew install forgejo
brew --prefix forgejo
/opt/homebrew/opt/forgejo
brew services start forgejo
==> Successfully started `forgejo` (label: homebrew.mxcl.forgejo)
```
From the service definition I can see that this runs as:
```bash
/opt/homebrew/opt/forgejo/bin/forgejo web --work-path /opt/homebrew/var/forgejo > /opt/homebrew/var/log/forgejo.log 2> /opt/homebrew/var/log/forgejo.log
```
It sounds from the docs like this means the config file should live at:
```
/opt/homebrew/var/forgejo/custom/conf/app.ini
```
Ah, based on the logs, it looks like forgejo has picked port 3000 which is used by grafana:
```
lsof -nP -iTCP:3000 -sTCP:LISTEN
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
grafana 1530 erichblume 15u IPv6 0x4acfad8b21dcb063 0t0 TCP *:3000 (LISTEN)
```
Ok I've set a basic config for port 3001, and then gone through the basic app setup. Looks like it's working! Not sure how SSH works yet though. Let's get this service registered.
Ok so the next issue is that I want to use ssh as my primary git interface, and
I want that to look to users like I'm using port 22 but I want to host it on
indri which has its own separate ssh setup. Hmm. Let's tell forgejo to use port
2200. Ah perfect, we can set SSH_PORT to 22 and SSH_LISTEN_PORT to 2200.
Hmm, let's stop running this as me and run as a new user, 'forgejo'.
```fish
sudo sysadminctl -addUser forgejo -system -shell /usr/bin/false
sudo chown -R forgejo:staff /opt/homebrew/var/forgejo
```
Ok, I think I need to switch all my services on this host over to a services file.
Wow, missing from the above is like 4 hours of deep diving in to the particulars of tailscale service definition hosting. In the end, I never got a services file to work - and yes, I did remember to advertise! Adding to the complexity is that I didn't discover until the end that you can't do "hairpinning", ie you CANNOT use the tailnet service name from the host doing that hosting. I probably had it fixed at some point hours ago and ruled it out because I didn't know about the hairpinning issue. So anyway... what ended up working was to just use the cli:
```fish
tailscale serve --service="svc:forge" --tcp=22 tcp://localhost:2200
tailscale serve --service="svc:forge" --https=443 http://localhost:3001
```
That's it. Nothing else needed, worked right away. Sheesh. (Ok there was also a
solid hour spent on permission issues... I honestly don't know how it's working
now, as there is now a `forgejo` user and the config says to use it but the
files are all owned by `erichblume:staff` but with group permissions set... in
any case, it friggin' works. So I'm happy.
# Configuration (Ansible-Managed)
As of 2026-01-23, the `app.ini` is managed by ansible:
- Template: `ansible/roles/forgejo/templates/app.ini.j2`
- Secrets fetched from 1Password in playbook pre_tasks
- Secrets item: "Forgejo Secrets" in blumeops vault (fields: `lfs-jwt-secret`, `internal-token`, `oauth2-jwt-secret`, `runner_reg`)
Deploy config changes:
```bash
mise run provision-indri -- --tags forgejo
```
# Forgejo Actions (CI/CD)
## Runner (k8s)
The Forgejo runner runs in Kubernetes with Docker-in-Docker (DinD) for container builds.
**Architecture:**
- Runner daemon + DinD sidecar in a single pod
- Jobs execute in containers using the `k8s` label
- DinD exposes Docker API on `tcp://127.0.0.1:2375`
- Pods reach `*.ops.eblu.me` services via Caddy reverse proxy
**Components:**
- ArgoCD app: `argocd/apps/forgejo-runner.yaml`
- Manifests: `argocd/manifests/forgejo-runner/`
- Job image: `registry.ops.eblu.me/blumeops/forgejo-runner` (Node.js + Docker CLI)
- Job image source: `containers/forgejo-runner/`
**Deployment:**
```bash
# Apply secret (contains runner token from 1Password)
op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
# Sync via ArgoCD
argocd app sync forgejo-runner
```
**View logs:**
```bash
kubectl --context=minikube-indri logs -n forgejo-runner -l app=forgejo-runner -c runner
```
## Container Build Workflow
Container images are built via `.forgejo/workflows/build-container.yaml`, triggered by tags matching `<container>-v<version>`.
**Release a container:**
```bash
mise run container-list # See available containers
mise run container-tag-and-release nettest v1.0.0 # Tag and trigger build
```
**Test container** (`containers/nettest/`): Network connectivity test for debugging CI/CD.
## Workflows
Workflows live in `.forgejo/workflows/` (not `.github/workflows/`).
**Important**: Use `github.*` context variables, not `gitea.*`. Forgejo supports both at runtime, but:
1. The Forgejo web UI schema validator only recognizes `github.*`
2. `actionlint` pre-commit hook validates workflows locally (catches errors before push)
3. Pass untrusted inputs (like `github.head_ref`) through env vars for security
## Runner Token
Stored in 1Password "Forgejo Secrets" item, field `runner_reg`.
To create a new token:
1. Go to https://forge.ops.eblu.me/admin/actions/runners
2. Click "Create new Runner"
3. Copy the token and update 1Password

95
docs/1768283761-TRXN.md Normal file
View file

@ -0,0 +1,95 @@
---
id: 1768283761-TRXN
aliases:
- prometheus
tags:
- blumeops
---
# Prometheus Management Log
Prometheus provides metrics storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure, running in Kubernetes (minikube on indri).
## Service Details
- URL: https://prometheus.tail8d86e.ts.net
- Namespace: `monitoring`
- Image: `prom/prometheus:v3.2.1`
- ArgoCD app: `prometheus`
- Storage: 50Gi PVC
## Data Sources
### Remote Write (from Alloy)
- Indri system metrics via [[alloy|Grafana Alloy]] remote_write
- Textfile metrics: minikube, borgmatic, zot, jellyfin
### Scrape Targets
- `sifaka:9100` - Synology NAS (node_exporter in Docker)
- `cnpg-metrics.tail8d86e.ts.net:9187` - CloudNativePG PostgreSQL metrics
- `kube-state-metrics.monitoring.svc:8080` - Kubernetes resource metrics (pods, deployments, etc.)
## Useful Commands
```bash
# View logs
kubectl --context=minikube-indri -n monitoring logs -f prometheus-0
# Check targets
curl -s https://prometheus.tail8d86e.ts.net/api/v1/targets | jq '.data.activeTargets[].scrapeUrl'
# Sync from ArgoCD
argocd app sync prometheus
```
## ArgoCD Management
Prometheus is deployed via ArgoCD from `argocd/manifests/prometheus/`:
- `statefulset.yaml` - StatefulSet with 50Gi PVC
- `configmap.yaml` - Prometheus configuration
- `service.yaml` - ClusterIP service
- `ingress-tailscale.yaml` - Tailscale Ingress
## Log
### Wed Jan 22 2026 (observability cleanup)
- Added kube-state-metrics scrape target for k8s resource metrics
- Enhanced Minikube dashboard with namespace filtering and resource usage panels
- Uses `kube_pod_info`, `kube_pod_container_resource_requests`, etc.
### Wed Jan 22 2026 (later)
- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet
- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net`
- Remote write endpoint now at k8s service, Alloy updated to push there
- Retired ansible prometheus role from indri
- Added ACL grant for `tag:homelab``tag:k8s` on port 443 for Alloy access
### Wed Jan 22 2026
Added CNPG PostgreSQL metrics scraping. The CloudNativePG operator exposes Prometheus metrics on port 9187. Exposed via Tailscale at `cnpg-metrics.tail8d86e.ts.net:9187` and added to scrape_configs as job `cnpg-postgres`.
### Wed Jan 15 2026
Prometheus now accepts metrics via remote_write from [[alloy|Grafana Alloy]]. The `--web.enable-remote-write-receiver` flag was added to enable this.
Indri metrics are no longer scraped - they're pushed by Alloy. Sifaka still uses traditional scraping via node_exporter running in Docker on the Synology.
### Mon Jan 13 2026
Prometheus is now managed via ansible in [[1767747119-YCPO|blumeops]]. Configuration files are templated from the ansible role.
### Mon Jan 12 2026 21:56
Prometheus was stood up about a week ago at this point. I am currently renaming
`localhost` to `indri` in the scrape_configs. While I'm here I'm going to see
if I can add Synology stats.
I'm adding Container Manager to Sifaka now. I should probably have a Sifaka
management log, but not yet. Downloaded prom/node-exporter and made a container
for it. Using the latest tag because I'm nasty.
Done. Adding to scrape configs.
Ok, it didn't like the indri hostname. Could probably fix somehow with either magicdns or /etc/hosts but for now, I'm using `relabel_configs`. This is working. Gotta go to bed.

149
docs/1768457769-LOCK.md Normal file
View file

@ -0,0 +1,149 @@
---
id: 1768457769-LOCK
aliases:
- pypi
- devpi
tags:
- blumeops
---
# PyPI / devpi Management Log
PyPI caching proxy running in Kubernetes (minikube on indri) via devpi-server.
## Service Details
- URL: https://pypi.tail8d86e.ts.net
- Namespace: devpi
- Image: registry.tail8d86e.ts.net/blumeops/devpi:latest (custom image with devpi-server + devpi-web)
- ArgoCD app: devpi
- Storage: 50Gi PVC
## Useful Commands
```bash
# View logs
kubectl --context=minikube-indri -n devpi logs -f statefulset/devpi
# Restart pod
kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi
# Check health
curl https://pypi.tail8d86e.ts.net/+api
# Sync from ArgoCD
argocd app sync devpi
```
## ArgoCD Management
Devpi is deployed via ArgoCD from `argocd/manifests/devpi/`:
- `statefulset.yaml` - StatefulSet with 50Gi PVC
- `service.yaml` - ClusterIP service
- `ingress-tailscale.yaml` - Tailscale Ingress for external access
- `Dockerfile` - Custom image with startup script
- `start.sh` - Auto-initialization script
## Users and Indices
### Structure
- `root/pypi` - PyPI mirror/cache (auto-created)
- `eblume/dev` - Private packages index (inherits from root/pypi)
### Creating a User and Index
```bash
# Login as root
uvx devpi use https://pypi.tail8d86e.ts.net
uvx devpi login root
# Create user (prompts for password - store in 1Password)
uvx devpi user -c USERNAME email=EMAIL
# Create index inheriting from PyPI mirror
uvx devpi index -c USERNAME/dev bases=root/pypi
```
### Uploading Packages (with uv)
```bash
# Store credentials (one-time, prompts for username/password)
uv auth login https://pypi.tail8d86e.ts.net
# Build and publish
cd ~/code/personal/your-package
uv build
uv publish --publish-url https://pypi.tail8d86e.ts.net/eblume/dev/
```
Note: The "trusted publishing failed" warning is expected (devpi doesn't support OIDC).
### Uploading Packages (with devpi-client)
```bash
# Login as the user
uvx devpi login USERNAME
# Use the index
uvx devpi use eblume/dev
# Upload from project directory
uvx devpi upload
```
## Client Configuration
On workstations, configure pip to use the proxy.
**pip.conf** (`~/.config/pip/pip.conf`):
```ini
[global]
index-url = https://pypi.tail8d86e.ts.net/root/pypi/+simple/
trusted-host = pypi.tail8d86e.ts.net
```
After creating/editing, track with chezmoi:
```bash
chezmoi add ~/.config/pip/pip.conf
```
## Credentials
- Root password stored in 1Password (blumeops vault)
- Injected into k8s via `devpi-root` secret from `secret-root.yaml.tpl`
## Backup
Private packages (`eblume/dev` index) are stored in the devpi PVC. The PyPI mirror cache (`root/pypi`) is not backed up as it can be re-fetched.
**TODO**: Add devpi PVC backup to borgmatic once k8s volume backup strategy is established.
## Related
- [[1767747119-YCPO|BlumeOps project card]]
- [[argocd|ArgoCD]] for deployment
- [[minikube|Kubernetes cluster]]
## Log
### Mon Jan 20 2026
- **Migrated to Kubernetes** (Phase 5 of k8s migration)
- Custom container image with devpi-server + devpi-web + auto-init startup script
- StatefulSet with 50Gi PVC for data persistence
- Tailscale Ingress at `pypi.tail8d86e.ts.net`
- Root password from 1Password secret, auto-initialized on first run
- Verified pip caching proxy and mcquack package upload
- **Key learnings:**
- Minikube CRI-O can't resolve Tailscale hostnames - added registry mirror config
- devpi-web Whoosh indexer needs ~2Gi during initial PyPI index build
- Kubernetes auto-sets `DEVPI_PORT` for service discovery - renamed to `DEVPI_LISTEN_PORT`
- Removed LaunchAgent from indri, cleared Tailscale serve entry
### Previous (indri era)
- Initial setup with devpi on indri via mcquack LaunchAgent
- Connected via Tailscale at pypi.tail8d86e.ts.net
- Created eblume/dev index for private packages
- Metrics collection via textfile exporter

167
docs/1768506761-GHUW.md Normal file
View file

@ -0,0 +1,167 @@
---
id: 1768506761-GHUW
aliases:
- alloy
- grafana-alloy
tags:
- blumeops
---
# Grafana Alloy Management Log
Grafana Alloy is a unified observability collector with two deployments:
1. **Indri (host)** - System metrics and service logs from macOS host
2. **Kubernetes (DaemonSet)** - Automatic pod log collection and service health probes
## Service Details
- Binary: `~/.local/bin/alloy` (built from source with CGO_ENABLED=1)
- Config: `~/.config/grafana-alloy/config.alloy`
- Data: `~/.local/share/grafana-alloy/`
- Logs: `~/Library/Logs/mcquack.alloy.{out,err}.log`
- Managed via: mcquack LaunchAgent (`mcquack.eblume.alloy`)
**Why built from source?** The Homebrew bottle is built with `CGO_ENABLED=0`, which uses Go's pure DNS resolver. This resolver reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files, breaking Tailscale MagicDNS hostname resolution. Building with `CGO_ENABLED=1` uses the macOS native resolver.
## What Alloy Collects
### Metrics
- System metrics via `prometheus.exporter.unix` (same metrics as node_exporter)
- Textfile collector reads from `/opt/homebrew/var/node_exporter/textfile/`
- `minikube.prom` - Minikube cluster status
- `borgmatic.prom` - Backup status metrics
- `zot.prom` - Container registry metrics
- `jellyfin.prom` - Jellyfin media server metrics
- Zot registry metrics scraped from `http://localhost:5050/metrics`
- Metrics pushed to Prometheus (k8s) via remote_write at `https://prometheus.tail8d86e.ts.net/api/v1/write`
### Logs
Collects logs from all services on Indri:
**Brew services:**
- forgejo
- tailscale
**mcquack LaunchAgents:**
- alloy (stdout/stderr)
- borgmatic (stdout/stderr)
- zot (stdout/stderr)
- jellyfin (stdout/stderr)
Logs pushed to Loki (k8s) at `https://loki.tail8d86e.ts.net/loki/api/v1/push`.
## Useful Commands
```bash
# Check service status
ssh indri 'launchctl list | grep alloy'
# View alloy logs
ssh indri 'tail -f ~/Library/Logs/mcquack.alloy.err.log'
# Restart service
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist && launchctl load ~/Library/LaunchAgents/mcquack.eblume.alloy.plist'
```
## Building from Source
Alloy must be built with CGO to use macOS native DNS resolver (required for Tailscale MagicDNS):
```bash
# On gilbert (dev workstation):
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy
cd ~/code/3rd/alloy && mise use go@1.25 node yarn
mise x -- make alloy
scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy
```
Then run ansible to deploy the config and LaunchAgent.
## Ansible Management (Indri)
Alloy on Indri is managed via ansible in [[1767747119-YCPO|blumeops]].
```bash
mise run provision-indri -- --tags alloy
```
## Kubernetes Alloy (alloy-k8s)
A separate Alloy DaemonSet runs in k8s for:
- **Automatic pod log collection** - discovers and collects logs from all pods
- **Service health probes** - HTTP blackbox probes for k8s services
### Service Details (k8s)
- Namespace: `alloy`
- Image: `grafana/alloy:v1.8.2`
- ArgoCD app: `alloy-k8s`
- Manifests: `argocd/manifests/alloy-k8s/`
### What k8s Alloy Collects
**Pod logs (automatic discovery):**
- All pods in all namespaces via `loki.source.kubernetes`
- Labels: namespace, pod, container, node
**Service health probes:**
- miniflux, kiwix, transmission, devpi, argocd
- Metrics: `probe_success`, `probe_duration_seconds`
- Labels: `job="integrations/blackbox/<service>"`
### Useful Commands (k8s Alloy)
```bash
# View alloy-k8s logs
kubectl --context=minikube-indri -n alloy logs -f daemonset/alloy
# Check running config
kubectl --context=minikube-indri -n alloy get configmap alloy-config -o yaml
# Sync from ArgoCD
argocd app sync alloy-k8s
```
## Log
### Wed Jan 22 2026 (later)
- **Added Alloy k8s DaemonSet** for automatic pod log collection
- Logs from all k8s pods now forwarded to Loki with automatic discovery
- Added service health probes for miniflux, kiwix, transmission, devpi, argocd
- New "Services Health" Grafana dashboard shows probe metrics
- Deleted stale textfile metrics (`devpi.prom`, `transmission.prom`) from indri
- Deleted stale data directories (`/opt/homebrew/var/loki`, `/opt/homebrew/var/prometheus`)
### Wed Jan 22 2026
- **Rebuilt from source with CGO_ENABLED=1** - required for Tailscale MagicDNS resolution
- Migrated from Homebrew to mcquack LaunchAgent management
- Updated remote_write to push to k8s Prometheus at `prometheus.tail8d86e.ts.net`
- Updated log push to k8s Loki at `loki.tail8d86e.ts.net`
- Removed prometheus/loki log collection (now running in k8s)
- Binary now at `~/.local/bin/alloy`, config at `~/.config/grafana-alloy/`
- Added build instructions to ansible role defaults
### Mon Jan 20 2026
- Removed devpi log collection (devpi migrated to k8s)
- Removed devpi.prom textfile collection (metrics role retired)
- Removed grafana log collection (grafana migrated to k8s in P2)
### Wed Jan 15 2026
- Initial setup replacing node_exporter
- Configured metrics push via remote_write to Prometheus
- Configured log collection for all services, forwarding to Loki
### Thu Jan 30 2026
- Removed Plex log and metrics collection (replaced by Jellyfin)
- Added Jellyfin log collection via mcquack LaunchAgent logs
- Added jellyfin.prom textfile metrics
### Wed Jan 15 2026 (later)
- Added Plex Media Server log collection (removed 2026-01-30)
- Added plex.prom metrics from plex_metrics role (removed 2026-01-30)

82
docs/1768506761-XGYX.md Normal file
View file

@ -0,0 +1,82 @@
---
id: 1768506761-XGYX
aliases:
- loki
tags:
- blumeops
---
# Loki Management Log
Loki is a log aggregation system running in Kubernetes (minikube on indri), providing log storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure.
## Service Details
- URL: https://loki.tail8d86e.ts.net
- Namespace: `monitoring`
- Image: `grafana/loki:3.4.2`
- ArgoCD app: `loki`
- Storage: 50Gi PVC
- Retention: 31 days
## Architecture
- Single-node deployment with filesystem storage
- TSDB index with 24h period
- Logs collected by [[alloy|Grafana Alloy]] and pushed via Loki API
- Queried via Grafana using the Loki datasource
## Useful Commands
```bash
# View logs
kubectl --context=minikube-indri -n monitoring logs -f loki-0
# Check if Loki is ready
curl -s https://loki.tail8d86e.ts.net/ready
# Sync from ArgoCD
argocd app sync loki
```
## Grafana Integration
Loki is configured as a datasource in Grafana. To explore logs:
1. Go to https://grafana.tail8d86e.ts.net/explore
2. Select "Loki" datasource
3. Use LogQL queries:
- `{service="forgejo"}` - all forgejo logs
- `{service="borgmatic", stream="stderr"}` - borgmatic errors
- `{host="indri"} |= "error"` - all logs containing "error"
## ArgoCD Management
Loki is deployed via ArgoCD from `argocd/manifests/loki/`:
- `statefulset.yaml` - StatefulSet with 50Gi PVC
- `configmap.yaml` - Loki configuration
- `service.yaml` - ClusterIP service
- `ingress-tailscale.yaml` - Tailscale Ingress
## Log
### Thu Jan 23 2026
- Suppressed noisy `v1 Endpoints is deprecated` warning from minikube storage-provisioner ([upstream issue](https://github.com/kubernetes/minikube/issues/21009))
- Added JSON field extraction for zot compatibility (`message` vs `msg`)
- Removed logfmt parsing stage - `stage.match` selectors don't prevent Alloy from logging internal decode errors, and most structured logs use JSON anyway
- Fixed devpi dashboard JSON escaping
### Wed Jan 22 2026
- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet
- Exposed via Tailscale Ingress at `loki.tail8d86e.ts.net`
- Alloy updated to push logs to k8s endpoint
- Retired ansible loki role from indri
### Wed Jan 15 2026
- Initial setup with single-node filesystem storage
- Configured 31-day retention with compactor
- Integrated with Grafana as datasource
- Logs collected via Alloy from all services

140
docs/argocd.md Normal file
View file

@ -0,0 +1,140 @@
---
id: argocd
aliases:
- argocd
- argo-cd
tags:
- blumeops
---
# ArgoCD Management Log
ArgoCD provides GitOps continuous delivery for the [[minikube]] cluster on Indri.
## Service Details
- URL: https://argocd.tail8d86e.ts.net
- Namespace: `argocd`
- Git source: `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git`
- Manifests path: `argocd/`
## Sync Policy Decision
**Choice**: Manual sync for workload apps, auto-sync only for app-of-apps.
**Rationale** (decided 2026-01-19 during Phase 1 migration):
- During migration, we want explicit control over what gets deployed
- Auto-sync could deploy broken changes while we're still learning the stack
- The app-of-apps (`apps`) auto-syncs so new Application manifests appear automatically
- But those Applications have manual sync, so actual workload changes require `argocd app sync <name>`
**Pattern**:
| Application | Sync Policy | Why |
|-------------|-------------|-----|
| `apps` | Automated | Picks up new Application manifests from git |
| `argocd` | Manual | Self-management changes should be deliberate |
| `tailscale-operator` | Manual | Infrastructure changes need review |
| `cloudnative-pg` | Manual | Operator upgrades need care |
| `blumeops-pg` | Manual | Database changes are sensitive |
| `grafana` | Manual | Observability stack changes need review |
| `grafana-config` | Manual | Dashboard changes should be deliberate |
| `miniflux` | Manual | Application changes need review |
| `devpi` | Manual | PyPI proxy changes need review |
**Future consideration**: After migration stabilizes, consider enabling auto-sync for stable workloads. Keep manual sync for infrastructure (operators, databases).
## CLI Access
```bash
# Login (uses Tailscale for network, prompts for password)
argocd login argocd.tail8d86e.ts.net --grpc-web
# List apps
argocd app list
# Sync an app
argocd app sync <app-name>
# Check diff before sync
argocd app diff <app-name>
# Get app details
argocd app get <app-name>
```
## Applications
| App | Path | Description |
|-----|------|-------------|
| `apps` | `argocd/apps/` | App-of-apps root |
| `argocd` | `argocd/manifests/argocd/` | ArgoCD self-management |
| `tailscale-operator` | `argocd/manifests/tailscale-operator/` | Tailscale k8s operator |
| `cloudnative-pg` | Helm chart (forge mirror) | PostgreSQL operator |
| `blumeops-pg` | `argocd/manifests/databases/` | PostgreSQL cluster |
| `prometheus` | `argocd/manifests/prometheus/` | Metrics storage |
| `loki` | `argocd/manifests/loki/` | Log aggregation |
| `grafana` | Helm chart (forge mirror) | Grafana dashboards |
| `grafana-config` | `argocd/manifests/grafana-config/` | Grafana ingress & dashboards |
| `alloy-k8s` | `argocd/manifests/alloy-k8s/` | Pod log collection & service probes |
| `kube-state-metrics` | `argocd/manifests/kube-state-metrics/` | K8s resource metrics |
| `miniflux` | `argocd/manifests/miniflux/` | RSS feed reader |
| `devpi` | `argocd/manifests/devpi/` | PyPI caching proxy |
| `torrent` | `argocd/manifests/torrent/` | BitTorrent daemon |
| `kiwix` | `argocd/manifests/kiwix/` | Offline Wikipedia & ZIM archives |
| `forgejo-runner` | `argocd/manifests/forgejo-runner/` | Forgejo Actions CI runner (host mode) |
## Credentials
- Admin password stored in 1Password (updated from initial auto-generated password)
- Git access via deploy key (SSH) stored in 1Password
## Log
### 2026-01-23 (CI/CD Bootstrap Phase 1)
- Added `forgejo-runner` - Forgejo Actions CI runner
- Runner uses host mode (jobs run directly in runner container, no Docker needed)
- Labels: `ubuntu-latest`, `ubuntu-22.04`
- Note: Stock runner lacks Node.js, so `actions/checkout@v4` doesn't work - use git clone instead
- See [[forgejo]] for runner token management and workflow examples
### 2026-01-22 (Observability Cleanup)
- Added `alloy-k8s` - DaemonSet for automatic pod log collection and service health probes
- Added `kube-state-metrics` - provides k8s resource metrics (pod counts, resource requests, etc.)
- Enhanced Minikube dashboard with namespace filtering and resource usage panels
- Added "Services Health" dashboard with probe metrics for all k8s services
- Fixed macOS dashboard instance variable to only show macOS hosts
- Cleaned up stale data: removed old textfile metrics and directories from indri
- Removed stale `/opt/homebrew/var/loki` from borgmatic backup sources
### 2026-01-22 (Phase 7)
- **Migrated Prometheus and Loki to k8s** - completed observability stack migration
- Both now running as StatefulSets with 50Gi PVCs
- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` and `loki.tail8d86e.ts.net`
- Grafana datasources updated to use k8s-internal service URLs
- Alloy rebuilt with CGO for Tailscale DNS resolution, pushes to k8s endpoints
- Retired ansible prometheus and loki roles from indri
### 2026-01-21 (Phase 6)
- Added torrent (Transmission BitTorrent) to k8s
- Added kiwix (offline Wikipedia & ZIM archives) to k8s
- NFS storage from sifaka for shared torrent/ZIM data
### 2026-01-20 (Phase 5)
- Added devpi (PyPI caching proxy) to k8s
- Custom container image in zot registry with devpi-server + devpi-web
- StatefulSet with 50Gi PVC for data persistence
- Changed `apps` Application to manual sync (was auto-sync with prune)
### 2026-01-19 (Phase 2)
- Migrated Grafana from Homebrew/Ansible to Kubernetes
- Helm chart repos now mirrored to forge (cloudnative-pg-charts, grafana-helm-charts)
- SSH credential template (`repo-creds-forge`) for all forge repos
- Added indri SSH host key to ArgoCD known_hosts
- Tailscale service cutover: deleted old svc:grafana from Tailscale admin to free hostname
- Retired ansible grafana role
### 2026-01-19 (Phase 1)
- Completed Phase 1 deployment
- Decided on manual sync policy for workloads
- Using internal [[forgejo]] as git source (not GitHub mirror)
- Exposed via Tailscale Ingress with Let's Encrypt TLS

176
docs/borgmatic.md Normal file
View file

@ -0,0 +1,176 @@
---
id: borgmatic
aliases:
- borgmatic
- borg-backup
tags:
- blumeops
---
# Borgmatic Management Log
Borgmatic runs daily backups from Indri to Sifaka NAS using Borg backup.
## Service Details
- Installed via: mise (pipx)
- Config: `~/.config/borgmatic/config.yaml` (ansible-managed)
- Schedule: Daily at 2:00 AM via LaunchAgent
- Repository: `/Volumes/backups/borg/` on Sifaka
## What Gets Backed Up
**Directories:**
- `~/code/personal/zk` - Zettelkasten (primary)
- `/opt/homebrew/var/forgejo` - Git forge data
- `~/.config/borgmatic` - Borgmatic config itself
- `~/Documents` - Personal documents
- `~/Pictures` - Photos (see note below)
**Note on iCloud Photos:** macOS Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally. Borgmatic only backs up what's on disk, so iCloud-only photos are NOT backed up. If you need full photo backups via borgmatic, either disable "Optimize Mac Storage" in Photos preferences, or use a tool like osxphotos which forces downloads. See log entry 2026-01-28.
**Databases:**
- `miniflux` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me)
- `teslamate` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me)
**Not backed up (by design):**
- ZIM archives in `~/transmission/` - re-downloadable via torrent
- Prometheus metrics - ephemeral data
- Loki logs - ephemeral (now in k8s PVC)
- devpi data - in k8s PVC, backup strategy TBD
## PostgreSQL Backup
Borgmatic uses native `postgresql_databases` support to stream `pg_dump` directly to Borg:
- No intermediate files needed
- Database keeps running (no downtime)
- Consistent transactional snapshots
- Uses `borgmatic` user with `pg_read_all_data` role
- Password read from `~/.pgpass` (managed by borgmatic ansible role)
- Uses explicit `pg_dump_command` path (`/opt/homebrew/opt/postgresql@18/bin/pg_dump`) since LaunchAgent doesn't have homebrew in PATH
- Uses explicit `local_path` (`/opt/homebrew/bin/borg`) for same reason
**Databases backed up:**
- `pg.ops.eblu.me:5432/miniflux` - CloudNativePG cluster in k8s
- `pg.ops.eblu.me:5432/teslamate` - CloudNativePG cluster in k8s
## Ansible Management
Borgmatic is fully managed via ansible in [[1767747119-YCPO|blumeops]]:
```bash
mise run provision-indri -- --tags borgmatic
```
The role deploys:
- `~/.config/borgmatic/config.yaml` - Main configuration
- LaunchAgent plist for scheduled runs
## Useful Commands
```bash
# List archives
ssh indri 'mise x -- borgmatic list'
# Extract from latest archive
ssh indri 'mise x -- borgmatic extract --archive latest --path /some/path'
# Run backup manually
ssh indri 'mise x -- borgmatic create --verbosity 1'
# Check repository health
ssh indri 'mise x -- borgmatic check'
```
## Retention Policy
- 7 daily backups
- 12 monthly backups
- 1000 yearly backups (effectively forever)
## Monitoring
Borgmatic metrics are collected hourly via a script at `~/bin/borgmatic-metrics` and exposed to Prometheus via the node_exporter textfile collector.
View the Grafana dashboard at: https://grafana.tail8d86e.ts.net (select "Borgmatic Backups" dashboard)
Metrics include:
- `borgmatic_up` - repository accessibility
- `borgmatic_repo_deduplicated_size_bytes` - actual disk usage
- `borgmatic_last_archive_original_size_bytes` - size of data being backed up
- `borgmatic_last_archive_deduplicated_size_bytes` - new data added per backup
- `borgmatic_archive_count` - number of archives
- `borgmatic_last_archive_timestamp` - when last backup ran
```bash
# Check metrics file
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom'
# Check metrics LaunchAgent status
ssh indri 'launchctl list | grep borgmatic-metrics'
```
## Log
### Tue Jan 28 2026
- Investigated massive backup size increase (~69GB deduplicated, ~94GB per archive)
- Root cause: immich-sync role (added Jan 26, removed Jan 28) used osxphotos to export photos
- **Lesson learned:** osxphotos forces Photos.app to download all iCloud originals locally
- Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally
- Before immich-sync: borgmatic was backing up thumbnails (~few GB)
- After immich-sync: borgmatic now has full 42GB of photo originals
- This is actually a bonus - provides redundant photo backup alongside iCloud and Immich
- Retention policy means these photos will be kept in yearly archives essentially forever
- **Future plan:** Once Immich (on sifaka "photos" volume with Synology offsite backup) is fully set up, Pictures may be removed from borgmatic as redundant
### Thu Jan 23 2026
- Note: Forgejo `app.ini` is now managed by ansible (secrets in 1Password)
- `/opt/homebrew/var/forgejo` still backed up for git repositories and data
- But `app.ini` recovery no longer depends on borgmatic (can be regenerated via ansible)
### Wed Jan 22 2026
- Removed `/opt/homebrew/var/loki` from backup sources (stale data from pre-k8s migration)
- Loki now runs in k8s with ephemeral storage - logs are not backed up by design
- Verified backup integrity after cleanup
### Mon Jan 20 2026 (P5)
- Removed `~/devpi` from backup sources (devpi migrated to k8s)
- devpi data now in k8s PVC - backup strategy TBD
### Sun Jan 19 2026 (P4)
- Removed localhost PostgreSQL backup (brew pg retired)
- Updated to backup only `pg.tail8d86e.ts.net` (k8s CloudNativePG)
- Moved .pgpass management from postgresql role to borgmatic role
### Sun Jan 19 2026 (P3)
- Fixed borgmatic failing to find `borg` binary by adding `local_path` option to config
- Added k8s-pg (CloudNativePG cluster) backup alongside brew PostgreSQL
- Added ACL grant for `tag:homelab``tag:k8s` on port 5432 for backup access
- Successfully tested disaster recovery: restored miniflux data from borgmatic dump to k8s-pg
- Created `borgmatic` user in k8s-pg via CloudNativePG managed roles
- Both localhost and k8s-pg databases backed up during migration period
### Sat Jan 18 2026
- Fixed borgmatic-metrics script failing in LaunchAgent context by using absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`) instead of `mise x -- borg`
- This was causing the Grafana dashboard to show "Repository Status: DOWN" and missing time series data
### Fri Jan 17 2026
- Fixed PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found")
- Removed `~/code/3rd/kiwix-tools` from backups (was just symlinks, ZIM archives are re-downloadable)
- Enabled Loki log backup (removed from exclude_patterns)
- Added borgmatic_metrics role for Prometheus metrics collection
- Added Grafana dashboard for backup monitoring (size trends, dedup ratio, time since last backup)
### Thu Jan 16 2026
- Moved config from manual management to ansible-managed template
- Added `postgresql_databases` backup for miniflux database
- Config now deployed via `ansible/roles/borgmatic/templates/config.yaml.j2`

75
docs/external-secrets.md Normal file
View file

@ -0,0 +1,75 @@
---
id: external-secrets
aliases:
- external-secrets
- eso
- external-secrets-operator
tags:
- blumeops
---
# External Secrets Operator
External Secrets Operator (ESO) syncs secrets from 1Password to Kubernetes Secrets via 1Password Connect.
## Architecture
```
1Password Cloud
|
v
1Password Connect (namespace: 1password)
|
v
External Secrets Operator (namespace: external-secrets)
|
v
Native Kubernetes Secrets
```
## Usage
ClusterSecretStore `onepassword-blumeops` provides access to the blumeops vault. See `argocd/manifests/devpi/external-secret.yaml` for a simple example.
**Important:** 1Password Connect doesn't support the `?ssh-format=openssh` query parameter. SSH keys must be stored as Secure Notes with the OpenSSH-formatted key (see `argocd-forge-ssh-key` item).
```bash
# Check all ExternalSecrets
kubectl --context=minikube-indri get externalsecret -A
# Find 1Password field names
op item get <item> --vault blumeops --format json | jq '.fields[] | .label'
```
## Bootstrap (One-Time Setup)
If reinstalling from scratch:
1. Create Connect server credentials:
```bash
op connect server create blumeops --vaults blumeops
op connect token create blumeops --server <server-id> --vault blumeops
```
2. Store in 1Password item "1Password Connect":
- `credentials-file`: raw JSON
- `credentials-base64`: base64-encoded JSON
- `token`: access token
3. Apply bootstrap secret:
```bash
kubectl --context=minikube-indri create namespace 1password
op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | \
kubectl --context=minikube-indri apply -f -
```
4. Sync apps in order:
- `argocd app sync 1password-connect`
- `argocd app sync external-secrets-crds`
- `argocd app sync external-secrets`
- `argocd app sync external-secrets-config`
## Related
- [[1767747119-YCPO|BlumeOps]]
- [[argocd|ArgoCD]]

58
docs/grafana.md Normal file
View file

@ -0,0 +1,58 @@
---
id: grafana
aliases:
- grafana
tags:
- blumeops
---
# Grafana Management Log
Grafana provides dashboards and observability for [[blumeops]].
## Service Details
- URL: https://grafana.ops.eblu.me (also https://grafana.tail8d86e.ts.net)
- Namespace: `monitoring`
- Helm chart: grafana (mirrored to forge)
- Values: `argocd/manifests/grafana/values.yaml`
- Dashboards: `argocd/manifests/grafana-config/dashboards/`
## Embedding Note
Grafana panel embedding via iframes was attempted for Homepage but didn't work well:
- Homepage's iframe widget doesn't support width constraints (only height)
- Grafana's "Public Dashboards" feature doesn't support template variables or PostgreSQL datasources
- Anonymous auth would be required, which exposes all dashboards
Current config has `allow_embedding: false`. If revisiting this, see git history for the iframe attempt (2026-01-30).
## Datasources
| Name | Type | URL |
|------|------|-----|
| Prometheus | prometheus | `http://prometheus.monitoring.svc.cluster.local:9090` |
| Loki | loki | `http://loki.monitoring.svc.cluster.local:3100` |
| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` |
## Dashboard Provisioning
Dashboards are provisioned via ConfigMaps with label `grafana_dashboard: "1"`. The sidecar watches for these and loads them automatically.
To add a dashboard:
1. Create ConfigMap in `argocd/manifests/grafana-config/dashboards/`
2. Add label `grafana_dashboard: "1"`
3. Optionally add annotation `grafana_folder: "FolderName"` for organization
4. Sync the `grafana-config` ArgoCD app
## Log
### 2026-01-30
- Attempted Grafana iframe embeds for Homepage metrics panel
- Issues: width constraints don't work, some panels fail to load
- Reverted to authenticated-only access (no anonymous auth)
### 2026-01-19 (Phase 2)
- Migrated from Homebrew/Ansible to Kubernetes
- Helm chart mirrored to forge
- Exposed via Tailscale Ingress

65
docs/indri.md Normal file
View file

@ -0,0 +1,65 @@
---
id: indri
aliases:
- indri
- mac-mini
tags:
- blumeops
---
# Indri Maintenance Log
Indri is a Mac Mini M1 (2020) serving as the primary [[1767747119-YCPO|BlumeOps]] server.
## Host Details
- Model: Mac mini M1, 2020 (Macmini9,1)
- Storage: 2TB internal SSD
- macOS: 15.7.3 (Sequoia)
- Role: Primary server for homelab services
## Passwordless Sudo
Configured passwordless sudo for `erichblume` user to allow ansible `become: true` tasks to run without password prompts:
```bash
# Config at /etc/sudoers.d/erichblume
erichblume ALL=(ALL) NOPASSWD: ALL
```
This is acceptable given the security model - tailnet access is the trust boundary.
## Sleep Prevention
Indri must stay awake to serve network requests. Currently using **Amphetamine** (App Store) to prevent sleep.
**Configuration:**
- Start Session At Launch: enabled
- Default Duration: indefinite
- Allow Closed-Display Sleep: enabled (no display attached)
**Known Issue:** Amphetamine can crash after extended uptime (~12 days observed), leaving the system unprotected. If this becomes a recurring problem, consider switching to system-level sleep prevention:
```bash
# Option 1: Disable sleep via pmset (requires sudo)
sudo pmset -c sleep 0 displaysleep 0
# Option 2: Use caffeinate daemon via LaunchAgent
# Create ~/Library/LaunchAgents/com.local.caffeinate.plist
caffeinate -s # -s = prevent sleep on AC power
```
These could be managed via ansible for reliability.
## Log
### Mon Jan 20 2026
**Amphetamine crash caused overnight sleep**
- Amphetamine 5.3.2 crashed at 19:08 on Jan 19 (segfault in `objc_release` during timer callback)
- System went to sleep at 19:20, stayed asleep overnight
- Discovered when services were unreachable; manually restarted Amphetamine at ~07:30
- Crash report: `~/Library/Logs/DiagnosticReports/Amphetamine-2026-01-19-190921.ips`
- Root cause: Memory management bug in Amphetamine during long-running session (~12 days uptime)
- Action: Monitoring for now; if recurs, will implement `pmset`/`caffeinate` via ansible

90
docs/jellyfin.md Normal file
View file

@ -0,0 +1,90 @@
---
id: jellyfin
aliases:
- jellyfin
tags:
- blumeops
---
# Jellyfin Management Log
Jellyfin is a free, open-source media server running natively on [[indri|Indri]] for full VideoToolbox hardware transcoding support.
## Service Details
- URL: https://jellyfin.ops.eblu.me
- Port: 8096 (localhost only, proxied via Caddy)
- Data directory: `~/Library/Application Support/jellyfin`
- Media path: `/Volumes/allisonflix` (NFS from sifaka)
- LaunchAgent: `mcquack.jellyfin`
## Useful Commands
```bash
# Check LaunchAgent status
ssh indri 'launchctl list | grep jellyfin'
# View logs
ssh indri 'tail -f ~/Library/Logs/mcquack.jellyfin.err.log'
# Check port is listening
ssh indri 'lsof -nP -iTCP:8096 -sTCP:LISTEN'
# Restart Jellyfin
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist && launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist'
# Check metrics
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/jellyfin.prom'
```
## Hardware Transcoding
Jellyfin uses Apple VideoToolbox for hardware-accelerated transcoding on the M1 Mac Mini.
**Capabilities:**
- H.264 encode/decode: Hardware
- HEVC (H.265) encode/decode: Hardware
- AV1 decode: Software only (requires M3+)
- HDR to SDR tone mapping: VPP (hardware)
- Concurrent 4K streams: ~3 with HDR tonemapping
**Configuration** (Dashboard > Playback):
1. Hardware Acceleration: Apple VideoToolbox
2. Allow hardware encoding: Enabled
3. VPP Tone mapping: Enabled (for HDR to SDR)
## Observability
- Metrics: Collected via `jellyfin_metrics` ansible role to Prometheus textfile
- Logs: Forwarded to Loki via Alloy (`service="jellyfin"`)
- Dashboard: "Jellyfin Media Server" in Grafana
### Metrics collected:
- `jellyfin_up` - Server availability
- `jellyfin_version_info` - Server version
- `jellyfin_library_items{library,type}` - Library counts
- `jellyfin_sessions_total` - Active sessions
- `jellyfin_sessions_playing` - Playing sessions
- `jellyfin_transcode_sessions_total` - Transcoding sessions
## API Key Setup
Metrics collection requires an API key:
1. Open https://jellyfin.ops.eblu.me
2. Go to Dashboard > API Keys > Add
3. Create key with description "metrics"
4. Save to indri:
```bash
ssh indri 'echo "YOUR_API_KEY" > ~/.jellyfin-api-key && chmod 600 ~/.jellyfin-api-key'
```
## Log
### 2026-01-30 (Initial Deployment)
- Deployed Jellyfin natively on indri via Ansible
- Installed via Homebrew cask, managed via LaunchAgent
- Added Caddy routing for `jellyfin.ops.eblu.me`
- Added metrics collection (jellyfin_metrics role)
- Added log collection via Alloy
- Created Grafana dashboard

103
docs/kiwix.md Normal file
View file

@ -0,0 +1,103 @@
---
id: kiwix
aliases:
- kiwix
tags:
- blumeops
---
# Kiwix Management Log
Kiwix serves offline Wikipedia (and other ZIM archives) in Kubernetes via Tailscale at https://kiwix.tail8d86e.ts.net.
## Service Details
- URL: https://kiwix.tail8d86e.ts.net
- Namespace: `kiwix`
- Image: `ghcr.io/kiwix/kiwix-serve:3.8.1`
- ArgoCD app: `kiwix`
- Storage: NFS mount from sifaka (`/volume1/torrents`)
## Architecture
The kiwix deployment has two components:
1. **kiwix-serve** - Main container serving ZIM files at port 80
2. **torrent-sync** - Sidecar that syncs declarative ZIM torrent list to Transmission
A CronJob (`zim-watcher`) runs hourly to detect new ZIM files and trigger a deployment restart when needed.
## Useful Commands
```bash
# View kiwix logs
kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c kiwix-serve
# View torrent sync logs
kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c torrent-sync
# Check ZIM watcher job
kubectl --context=minikube-indri -n kiwix get cronjob zim-watcher
# Manually trigger ZIM watcher
kubectl --context=minikube-indri -n kiwix create job --from=cronjob/zim-watcher zim-watcher-manual
# Sync from ArgoCD
argocd app sync kiwix
```
## ArgoCD Management
Kiwix is deployed via ArgoCD from `argocd/manifests/kiwix/`:
- `deployment.yaml` - Kiwix-serve + torrent-sync sidecar
- `service.yaml` - ClusterIP service
- `ingress-tailscale.yaml` - Tailscale Ingress
- `configmap-zim-torrents.yaml` - Declarative list of ZIM torrents to download
- `configmap-sync-script.yaml` - Script to sync torrents to Transmission
- `cronjob-zim-watcher.yaml` - Hourly job to restart kiwix on new ZIMs
## Adding New ZIM Archives
1. Edit `argocd/manifests/kiwix/configmap-zim-torrents.yaml`
2. Add the torrent URL from https://download.kiwix.org/zim/
3. Sync the kiwix app: `argocd app sync kiwix`
4. The torrent-sync sidecar will add the torrent to [[transmission|Transmission]]
5. Once downloaded, the zim-watcher CronJob will detect it and restart kiwix
## Configured Archives
The declarative torrent list includes:
- Wikipedia top 1M English articles with images
- Project Gutenberg (60,000+ public domain books)
- iFixit repair guides
- Stack Exchange sites (SuperUser, Math, etc.)
- LibreTexts textbooks (Bio, Chem, Eng, Math, Phys, Humanities)
- DevDocs (developer documentation bundles)
See `argocd/manifests/kiwix/configmap-zim-torrents.yaml` for the full list.
## Storage
ZIM files are stored on sifaka NAS at `/volume1/torrents/complete/`. The kiwix pod mounts this directory via NFS.
**Note**: The NFS mount works because minikube uses the docker driver which NATs through indri's LAN IP, allowing direct access to sifaka.
## Log
### 2026-01-21 (P6)
- **Migrated to Kubernetes** (Phase 6 of k8s migration)
- Direct NFS mount from sifaka (no PVC, shared with transmission)
- Torrent-sync sidecar adds configured ZIMs to Transmission
- ZIM-watcher CronJob restarts deployment when new files appear
- Tailscale Ingress at `kiwix.tail8d86e.ts.net`
- Retired ansible kiwix role from indri
### 2026-01-14
- Added transmission integration for background torrent downloads
- Enabled Gutenberg, iFixit, SuperUser, Math SE, and all LibreTexts archives
### 2026-01-13
- Added kiwix role to ansible playbook
- Operationalized ZIM archive downloads with configurable list
- Initial setup with kiwix-tools binary on indri
- Managed via LaunchAgent on port 5501

83
docs/miniflux.md Normal file
View file

@ -0,0 +1,83 @@
---
id: miniflux
aliases:
- miniflux
- feed
- rss
tags:
- blumeops
---
# Miniflux Management Log
Miniflux is a minimalist RSS/Atom feed reader running in Kubernetes (minikube on indri).
## Service Details
- URL: https://feed.tail8d86e.ts.net
- Namespace: miniflux
- Image: ghcr.io/miniflux/miniflux:latest
- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net)
- ArgoCD app: miniflux
## Useful Commands
```bash
# View logs
kubectl -n miniflux logs -f deployment/miniflux
# Restart deployment
kubectl -n miniflux rollout restart deployment/miniflux
# Check health
curl https://feed.tail8d86e.ts.net/healthcheck
# Sync from ArgoCD
argocd app sync miniflux
```
## ArgoCD Management
Miniflux is deployed via ArgoCD from `argocd/manifests/miniflux/`:
- `deployment.yaml` - Deployment with environment configuration
- `service.yaml` - ClusterIP service
- `ingress-tailscale.yaml` - Tailscale Ingress for external access
## Credentials
The miniflux database user password is auto-generated by CloudNativePG and stored in the `blumeops-pg-app` secret in the databases namespace.
To recreate the miniflux-db secret:
```bash
kubectl create secret generic miniflux-db -n miniflux \
--from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)"
```
## Features
- Keyboard shortcuts for efficient reading
- Fever and Google Reader API compatible
- Mobile-friendly web interface
- OPML import/export
- Content scraping for full articles
## Backup
Feed subscriptions and read state stored in [[postgresql]], backed up via borgmatic's postgresql_databases hook.
## Log
### Sun Jan 19 2026
- **Migrated to Kubernetes** (Phase 4 of k8s migration)
- Deployed via ArgoCD in `miniflux` namespace
- Database connection via internal k8s DNS to CloudNativePG cluster
- Exposed via Tailscale Ingress at feed.tail8d86e.ts.net
- Removed brew miniflux service and ansible role from indri
- Fixed table ownership issue after P3 restore (tables were owned by eblume, needed to be owned by miniflux)
### Thu Jan 16 2026
- Initial setup with Miniflux 2.x on brew
- Connected to PostgreSQL 18 on localhost
- Exposed via Tailscale at feed.tail8d86e.ts.net

137
docs/minikube.md Normal file
View file

@ -0,0 +1,137 @@
---
id: minikube
aliases:
- minikube
- kubernetes
- k8s
tags:
- blumeops
---
# Minikube Management Log
Minikube provides a single-node Kubernetes cluster on Indri for running containerized services.
## Cluster Details
- Driver: **docker** (runs as container inside Docker Desktop)
- Container runtime: docker
- Kubernetes version: v1.34.0
- Resources: 6 CPUs, 11GB RAM (leaves 1GB for Docker Desktop overhead), 200GB disk
- API server: https://k8s.tail8d86e.ts.net (Tailscale service with TCP passthrough)
- Internal port: dynamic (currently 50820 - Docker maps random host port to container's 6443)
**Prerequisites:** Docker Desktop must be installed and running with at least 12GB memory allocated.
## Remote Access from Gilbert
Run `mise run ensure-minikube-indri-kubectl-config` to set up kubectl access. This script:
1. Fetches certificates from indri via SSH
2. Creates kubeconfig at `~/.kube/minikube-indri/config.yml`
**Fish abbreviations** (in `~/.config/fish/config.fish`):
- `ki` -> `kubectl --context=minikube-indri`
- `k9i` -> `k9s --context=minikube-indri`
- `k9` -> `k9s`
```bash
# Quick access via abbreviations
ki get nodes
k9i
# Or explicitly set context
kubectl config use-context minikube-indri
kubectl get nodes
```
## Volume Mounting (for P6 kiwix/transmission)
**Direct NFS from pods to sifaka** - tested and working.
Docker NATs outbound traffic through indri's LAN IP (192.168.1.50). Sifaka's NFS exports allow:
- `192.168.1.0/24` - Docker containers via indri NAT
- `100.64.0.0/10` - Tailscale clients
Pods mount NFS directly:
```yaml
volumes:
- name: torrents
nfs:
server: sifaka
path: /volume1/torrents
```
No LaunchAgents, no `minikube mount`, no hostPath complexity needed.
## Useful Commands (on indri)
```bash
# Cluster status
minikube status
# Start/stop cluster
minikube start
minikube stop
# Access dashboard
minikube dashboard
# SSH into node
minikube ssh
# View logs
minikube logs
# Get API server URL (shows current port)
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
```
## Registry Mirror (Zot)
Containerd is configured to use [[zot]] on indri as a pull-through cache for container images. This is managed by the ansible `minikube` role.
Config location: `/etc/containerd/certs.d/<registry>/hosts.toml` (inside minikube container)
With docker driver, uses `host.minikube.internal:5050` to reach zot on the host.
Mirrors configured for:
- `registry.ops.eblu.me` (private images)
- `docker.io`
- `ghcr.io`
- `quay.io`
To verify the mirror is working:
```bash
# Check zot's cached images
curl -s http://localhost:5050/v2/_catalog | jq
```
## Log
### 2026-01-21 (Docker Driver Migration)
- **Migrated from qemu2 to docker driver** (Phase 5.1)
- qemu2 had Tailscale TCP proxy issue (TLS handshake timeout to VM IP)
- docker driver puts API server on localhost, which Tailscale serve handles correctly
- Removed socket_vmnet, qemu dependencies
- Removed NFS/minikube-mount LaunchAgents (will re-add NFS for P6 with simpler hostPath approach)
- API server port is now dynamic (Docker assigns random host port)
- Ansible role updated to query port and configure tailscale serve accordingly
- Created `mise run ensure-minikube-indri-kubectl-config` for workstation setup
### 2026-01-21 (QEMU2 Migration - superseded)
- Migrated from podman to qemu2 driver
- Podman driver had fundamental limitations preventing volume mounts
- qemu2 created actual VM with full kernel capabilities
- Volume mounting solution: NFS on host + minikube mount passthrough
- **Issue discovered:** Tailscale TCP proxy to VM IP (192.168.105.2:6443) fails with TLS timeout
### 2026-01-19
- Configured CRI-O registry mirror to use zot as pull-through cache
- Added ansible automation to apply mirror config on provisioning
- Fixed ansible hanging: `minikube ssh` with piped stdin requires `--native-ssh=false`
### 2026-01-18
- Initial cluster setup for k8s migration Phase 0
- Configured for remote access with --apiserver-names=indri
- 1Password credential integration for kubectl from gilbert
- Exposed as Tailscale service `k8s.tail8d86e.ts.net` with TCP passthrough

80
docs/navidrome.md Normal file
View file

@ -0,0 +1,80 @@
---
id: navidrome
aliases:
- DJ
tags:
- blumeops
- service
---
Navidrome is a self-hosted music streaming server deployed on [[blumeops|BlumeOps]].
# Access
- **Primary URL**: https://dj.ops.eblu.me (via Caddy)
- **Tailscale URL**: https://dj.tail8d86e.ts.net
# Deployment
Navidrome runs in Kubernetes (minikube on [[indri]]) and is managed via [[argocd|ArgoCD]].
**Manifests**: `argocd/manifests/navidrome/`
## Storage
| Mount | Type | Source | Access |
|---------|-------------------|-------------------------|------------|
| /music | NFS PV | sifaka:/volume1/music | Read-only |
| /data | Local PVC (10Gi) | minikube storage class | Read-write |
The `/data` directory contains:
- SQLite database
- Configuration
- Cache files
## Configuration
Environment variables set in deployment:
- `ND_SCANSCHEDULE=1h` - Rescan library every hour
- `ND_LOGLEVEL=info` - Standard logging level
- `ND_MUSICFOLDER=/music` - Music library path
- `ND_DATAFOLDER=/data` - Data directory path
## Initial Setup
On first access, Navidrome will prompt to create an admin user. No default credentials.
# Operations
## Sync Application
```bash
argocd app sync navidrome
```
## Check Status
```bash
argocd app get navidrome
kubectl --context=minikube-indri -n navidrome get pods
kubectl --context=minikube-indri -n navidrome logs deploy/navidrome
```
## Verify NFS Mount
```bash
kubectl --context=minikube-indri -n navidrome exec deploy/navidrome -- ls /music
```
## Force Library Rescan
Access Settings > Library in the web UI, or trigger via API:
```bash
curl -X POST https://dj.ops.eblu.me/api/library/scan -H "x-nd-authorization: Bearer <token>"
```
# Related
- [[jellyfin]] - Video streaming (runs on indri directly)
- [[argocd]] - GitOps deployment
- [[blumeops]] - Infrastructure overview

131
docs/postgresql.md Normal file
View file

@ -0,0 +1,131 @@
---
id: postgresql
aliases:
- postgresql
- postgres
- pg
tags:
- blumeops
---
# PostgreSQL Management Log
PostgreSQL database cluster running in Kubernetes (minikube on indri) via CloudNativePG operator, providing storage for [[miniflux]] and other services.
## Quick Connect
```bash
# Connect as superuser (fetches password from 1Password)
PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -d miniflux
```
## Service Details
- URL: tcp://pg.tail8d86e.ts.net:5432
- Metrics: http://cnpg-metrics.tail8d86e.ts.net:9187/metrics
- Namespace: databases
- Cluster name: blumeops-pg
- Operator: CloudNativePG
- ArgoCD app: blumeops-pg
## Databases
| Database | Owner | Purpose |
|----------|----------|----------------------------|
| miniflux | miniflux | Miniflux feed reader data |
## Users
| User | Role | Purpose |
|-----------|------------------|------------------------|
| postgres | superuser | CNPG internal |
| miniflux | app owner | Owns miniflux database |
| eblume | superuser | Admin access |
| borgmatic | pg_read_all_data | Backup access |
## Useful Commands
```bash
# List databases
PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\l"
# List users
PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\du"
# View CNPG cluster status
kubectl -n databases get cluster blumeops-pg
# View pod logs
kubectl -n databases logs -f blumeops-pg-1
```
## Backup
PostgreSQL data is backed up via borgmatic from indri using the `postgresql_databases` hook, which streams pg_dump directly to Borg for consistent backups.
Borgmatic config (`~/.config/borgmatic/config.yaml`):
```yaml
postgresql_databases:
- name: miniflux
hostname: pg.tail8d86e.ts.net
port: 5432
username: borgmatic
```
Password is read from `~/.pgpass` (managed by borgmatic ansible role).
## ArgoCD Management
```bash
# Sync cluster changes
argocd app sync blumeops-pg
# Force reconcile
kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
```
**Files:**
- Cluster spec: `argocd/manifests/databases/blumeops-pg.yaml`
- Tailscale service: `argocd/manifests/databases/service-tailscale.yaml`
- Secrets: `secret-eblume.yaml.tpl`, `secret-borgmatic.yaml.tpl` (via `op inject`)
## Credentials
**1Password items:**
- `guxu3j7ajhjyey6xxl2ovsl2ui` - eblume superuser password
- `mw2bv5we7woicjza7hc6s44yvy` - borgmatic user password
**CNPG-managed secrets:**
- `blumeops-pg-app` - miniflux user (auto-generated password)
- `blumeops-pg-eblume` - eblume superuser
- `blumeops-pg-borgmatic` - borgmatic backup user
## Log
### Wed Jan 22 2026
- Added CNPG metrics collection via Tailscale service at `cnpg-metrics.tail8d86e.ts.net:9187`
- Updated PostgreSQL Grafana dashboard to use CNPG metric names (`cnpg_*` prefix)
- Prometheus on indri now scrapes CNPG metrics directly
### Sun Jan 19 2026 (P4)
- **Retired brew PostgreSQL** - k8s CloudNativePG is now the only PostgreSQL
- Renamed Tailscale hostname from `k8s-pg` to `pg` (canonical)
- Removed postgresql ansible role from indri
- Moved .pgpass management to borgmatic role
- Updated borgmatic to backup only `pg.tail8d86e.ts.net`
- Fixed table ownership issue: P3 restore created tables owned by eblume, transferred to miniflux
### Sun Jan 19 2026 (P3)
- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
- Added borgmatic user to k8s-pg via CloudNativePG managed roles
- Both brew and k8s PostgreSQL backed up by borgmatic during migration
- Added Tailscale ACL: `tag:homelab``tag:k8s` on port 5432 for backup access
### Thu Jan 16 2026
- Initial setup with PostgreSQL 18 (brew)
- Created miniflux database and user
- Exposed via Tailscale at pg.tail8d86e.ts.net

73
docs/pulumi.md Normal file
View file

@ -0,0 +1,73 @@
---
id: pulumi
aliases:
- pulumi
- tailnet-iac
tags:
- blumeops
---
# Pulumi Tailnet IaC Management Log
Pulumi manages the tail8d86e.ts.net tailnet configuration, including ACLs, tags, and DNS settings.
## Architecture
Two-layer approach:
- **Layer 1 (Pulumi)**: Tailnet-wide config - ACLs, tags, DNS (this card)
- **Layer 2 (Ansible)**: Node-local `tailscale serve` config - see `tailscale_serve` role
## Service Details
- State backend: Pulumi Cloud (https://app.pulumi.com/eblume/blumeops-tailnet)
- Stack: `tail8d86e`
- Config directory: `pulumi/` in blumeops repo
- Policy file: `pulumi/policy.hujson` (HuJSON with comments)
## Authentication
Uses OAuth client stored in 1Password (blumeops vault):
- Client configured with scopes: acl, dns, devices, services
- Auto-applies `tag:blumeops` to IaC-managed resources
## Useful Commands
```bash
# Preview changes
mise run tailnet-preview
# Apply changes
mise run tailnet-up
# View current state
mise run tailnet-preview
# Pass additional args
mise run tailnet-up -- --yes
```
## Making ACL Changes
1. Edit `pulumi/policy.hujson` in the blumeops repo
2. Run `mise run tailnet-preview` to see what will change
3. Run `mise run tailnet-up` to apply
4. Commit and push
## What's Managed
Currently managed by Pulumi:
- ACL policy (`tailscale:index:Acl`)
Can be added later:
- DNS nameservers (`tailscale:index:DnsNameservers`)
- DNS search paths (`tailscale:index:DnsSearchPaths`)
- Tailnet settings (`tailscale:index:TailnetSettings`)
## Log
### Wed Jan 15 2026
- Initial setup with Pulumi + Python
- Imported existing ACL from Tailscale
- State stored in Pulumi Cloud (free tier)
- OAuth authentication via 1Password

113
docs/teslamate.md Normal file
View file

@ -0,0 +1,113 @@
---
id: teslamate
aliases:
- teslamate
- tesla
tags:
- blumeops
---
# TeslaMate
TeslaMate is a self-hosted Tesla data logger running in Kubernetes (minikube on indri), collecting and visualizing vehicle data from the Tesla Owner API.
## Service Details
- URL: https://tesla.tail8d86e.ts.net
- Namespace: `teslamate`
- Image: `teslamate/teslamate:2.2.0`
- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net)
- ArgoCD app: `teslamate`
## What TeslaMate Collects
- Battery level, state of charge, range estimates
- Charging sessions (location, energy, cost, duration)
- Drives (distance, efficiency, routes)
- Climate/HVAC usage
- Software update history
- Vampire drain analysis
- Vehicle states (asleep, driving, charging, online)
## Grafana Dashboards
18 dashboards available in Grafana under the "TeslaMate" folder at https://grafana.tail8d86e.ts.net:
- Overview, Charges, Drives, Efficiency, States
- Battery Health, Vampire Drain, Statistics
- Charge Level, Locations, Trip, Mileage
- Drive Stats, Charging Stats, Projected Range
- Timeline, Updates, Visited
Dashboards use the `TeslaMate` PostgreSQL datasource (not Prometheus).
## Useful Commands
```bash
# View logs
kubectl --context=minikube-indri -n teslamate logs -f deployment/teslamate
# Check pod status
kubectl --context=minikube-indri -n teslamate get pods
# Restart deployment
kubectl --context=minikube-indri -n teslamate rollout restart deployment/teslamate
# Sync from ArgoCD
argocd app sync teslamate
```
## Credentials
**1Password items (blumeops vault):**
- `TeslaMate` - contains `db_password` and `api_enc_key` fields
**Kubernetes secrets:**
- `teslamate-db` (teslamate ns) - DATABASE_PASS for PostgreSQL connection
- `teslamate-encryption` (teslamate ns) - ENCRYPTION_KEY for token encryption
- `blumeops-pg-teslamate` (databases ns) - CloudNativePG managed role password
- `grafana-teslamate-datasource` (monitoring ns) - Grafana datasource password
## Backup
TeslaMate data is backed up via [[borgmatic]]:
- PostgreSQL database `teslamate` included in `borgmatic_postgresql_databases`
- Backed up alongside miniflux to sifaka NAS
## Tesla API Authentication
TeslaMate uses Tesla's Owner API (not Fleet API) via OAuth:
1. Access https://tesla.tail8d86e.ts.net
2. Click "Sign in with Tesla"
3. Complete OAuth flow in browser
4. Tokens are encrypted with ENCRYPTION_KEY and stored in database
5. TeslaMate automatically refreshes tokens as needed
**Standalone OAuth tool:** If you need to manually obtain tokens, there's a Rust-based helper:
- Mirror: https://forge.tail8d86e.ts.net/eblume/tesla_auth.git
- Runs OAuth flow and outputs access/refresh tokens
## Database Notes
- TeslaMate requires PostgreSQL 17.3+ or 18.x
- The `teslamate` user has superuser privileges (required for extension management during migrations)
- Extensions used: `cube`, `earthdistance` (for geospatial calculations)
## Related
- [[1767747119-YCPO|BlumeOps]]
- [[argocd|ArgoCD]]
- [[postgresql|PostgreSQL]]
- [[borgmatic|Borgmatic]]
## Log
### Thu Jan 23 2026
- Initial deployment to Kubernetes
- 18 Grafana dashboards imported from TeslaMate project
- Upgraded CloudNativePG 1.25 -> 1.28 for major version upgrade support
- Upgraded PostgreSQL 17.2 -> 18.1 (required for TeslaMate 2.2.0)
- Tailscale Ingress at `tesla.tail8d86e.ts.net`
- Backup configuration added to borgmatic

100
docs/transmission.md Normal file
View file

@ -0,0 +1,100 @@
---
id: transmission
aliases:
- transmission
tags:
- blumeops
---
# Transmission Management Log
Transmission is a BitTorrent daemon running in Kubernetes, primarily used to download large ZIM archives for [[kiwix|Kiwix]].
## Service Details
- URL: https://torrent.tail8d86e.ts.net
- Namespace: `torrent`
- Image: `lscr.io/linuxserver/transmission:latest`
- ArgoCD app: `torrent`
- Storage: NFS PVC from sifaka (`/volume1/torrents`)
## Useful Commands
```bash
# View transmission logs
kubectl --context=minikube-indri -n torrent logs -f deployment/transmission
# Check RPC connectivity (from another pod)
kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \
curl -s http://transmission.torrent.svc.cluster.local:9091/transmission/rpc
# Sync from ArgoCD
argocd app sync torrent
```
## ArgoCD Management
Transmission is deployed via ArgoCD from `argocd/manifests/torrent/`:
- `deployment.yaml` - Transmission container with NFS volume
- `service.yaml` - ClusterIP service (port 9091)
- `ingress-tailscale.yaml` - Tailscale Ingress for web UI
- `pv-nfs.yaml` - NFS PersistentVolume
- `pvc.yaml` - PersistentVolumeClaim
## Storage Layout
The NFS share on sifaka (`/volume1/torrents`) has this structure:
- `/downloads/` - Active downloads and torrent metadata
- `/downloads/complete/` - Completed downloads
- `/config/` - Transmission configuration
- `/watch/` - Watch directory for .torrent files
Kiwix reads from `/downloads/complete/` to serve ZIM archives.
## Integration with Kiwix
The [[kiwix]] deployment includes a torrent-sync sidecar that:
1. Reads the declarative ZIM torrent list from a ConfigMap
2. Adds missing torrents to Transmission via RPC
3. Runs on startup and every 30 minutes
When downloads complete:
1. Transmission moves files to `/downloads/complete/`
2. The zim-watcher CronJob (in kiwix namespace) detects new ZIMs
3. Kiwix deployment is restarted to pick up new archives
## Monitoring
**TODO:** Write custom transmission exporter. Existing exporters (`metalmatze/transmission-exporter`, `sandrotosi/simple_transmission_exporter`) are incompatible with Transmission 4's changed JSON API (type mismatches in `lastScrapeTimedOut` field).
Current monitoring via web UI at https://torrent.tail8d86e.ts.net:
- Active/seeding/paused torrent counts
- Upload/download speeds
- Disk usage
Basic uptime monitoring via blackbox probe in [[alloy|Alloy k8s]] (see Services Health dashboard).
## Log
### 2026-01-22
- Attempted to add `metalmatze/transmission-exporter` sidecar for Prometheus metrics
- Exporter failed with JSON parsing errors - incompatible with Transmission 4 API changes
- Removed exporter sidecar, dashboard, and Prometheus scrape config
- Added basic HTTP probe via Alloy k8s blackbox exporter instead
- Deleted stale `transmission.prom` textfile from indri
### 2026-01-21 (P6)
- **Migrated to Kubernetes** (Phase 6 of k8s migration)
- NFS PersistentVolume for storage on sifaka
- Tailscale Ingress at `torrent.tail8d86e.ts.net`
- RPC accessible to kiwix namespace for torrent sync
- Moved existing ZIM files to `/downloads/complete/` for seeding
- Retired ansible transmission role from indri
### 2026-01-14
- Added transmission role to ansible playbook
- Integrated with kiwix role for torrent-based ZIM downloads
- Initial setup with transmission-cli via homebrew
- Managed via brew services on port 9091
- Metrics collected via textfile exporter

112
docs/zot.md Normal file
View file

@ -0,0 +1,112 @@
---
id: zot
aliases:
- zot
- container-registry
tags:
- blumeops
---
# Zot Registry Management Log
Zot is an OCI-native container registry running on Indri, providing:
1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
2. Private image storage for custom-built containers
## Service Details
- URL: https://registry.ops.eblu.me
- Local port: 5050
- Data directory: ~/zot
- Config: ~/.config/zot/config.json
- Managed via: mcquack LaunchAgent
## Namespace Convention
| Path | Source |
|------|--------|
| `registry.../docker.io/*` | Cached from Docker Hub |
| `registry.../ghcr.io/*` | Cached from GHCR |
| `registry.../quay.io/*` | Cached from Quay |
| `registry.../blumeops/*` | Private images (yours) |
## How It Works
### Pull-Through Cache (Automatic)
When [[minikube]] pulls an image like `docker.io/library/nginx:latest`:
1. Containerd checks zot first (via `host.minikube.internal:5050`)
2. If zot has it cached, returns immediately
3. If not, zot fetches from upstream, caches it, returns to k8s
Cached images appear under their original registry path (e.g., `docker.io/library/nginx`).
### Private Images (Manual Push)
Build and push from gilbert using podman:
```bash
# Build
podman build -t registry.ops.eblu.me/blumeops/myapp:v1 .
# Push to zot
podman push registry.ops.eblu.me/blumeops/myapp:v1
# Use in k8s manifest
image: registry.ops.eblu.me/blumeops/myapp:v1
```
Private images go under `blumeops/*` namespace. Example: the devpi container is at `registry.ops.eblu.me/blumeops/devpi:latest`.
### Security Model
**Network access only** - no authentication configured. Anyone who can reach zot via Tailscale ACL can push/pull any image. Defense is the tailnet boundary.
Zot supports htpasswd/LDAP/OIDC auth if needed in the future.
## Minikube Integration
The [[minikube]] cluster uses zot as a registry mirror via containerd configuration. Managed by the ansible `minikube` role.
From inside minikube, zot is at `host.minikube.internal:5050`. Containerd tries the mirror first, falls back to upstream if not cached.
Mirrors configured for: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io`
## Useful Commands
```bash
# List all cached/pushed images
curl -s http://indri:5050/v2/_catalog | jq
# List tags for an image
curl -s http://indri:5050/v2/blumeops/devpi/tags/list | jq
# Check service status
ssh indri 'launchctl list | grep zot'
# View logs
ssh indri 'tail -f ~/Library/Logs/mcquack.zot.err.log'
```
## Log
### 2026-01-25
- **Migrated from Tailscale serve to Caddy** - now accessible at `registry.ops.eblu.me`
- Retired `tailscale_serve` ansible role (no longer needed)
- Updated minikube containerd config to use new URL
- Updated CI workflows and mise tasks
- Old URL (`registry.tail8d86e.ts.net`) deprecated
### 2026-01-21
- Verified full workflow: podman build on gilbert → push to zot → k8s pull
- Documented security model (network-only auth via Tailscale ACL)
- Updated minikube integration: now uses containerd (docker driver) instead of CRI-O (podman driver)
- Mirror endpoint changed from `host.containers.internal:5050` to `host.minikube.internal:5050`
### 2026-01-19
- Integrated with minikube as CRI-O registry mirror
- All k8s image pulls now go through zot cache automatically
### 2026-01-18
- Initial setup for k8s migration Phase 0
- Configured pull-through cache for Docker Hub, GHCR, Quay
- Exposed via Tailscale service at registry.tail8d86e.ts.net

View file

@ -3,11 +3,12 @@
set -euo pipefail
ZK_DIR="$HOME/code/personal/zk"
MAIN_CARD="$ZK_DIR/1767747119-YCPO.md"
# Blumeops docs now live in the repo itself (symlinked into zk)
DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs"
MAIN_CARD="$DOCS_DIR/1767747119-YCPO.md"
# Find all files tagged with blumeops (excluding main card)
other_cards=$(grep -l '^ - blumeops$' "$ZK_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort)
other_cards=$(grep -l '^ - blumeops$' "$DOCS_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort)
# Concatenate: main card first, then others
# Pass through any args to bat (e.g., --style=header --color=never --decorations=always)