From a7d771d9458c76b2d654a176a82884a0ea7e41a8 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 2 Feb 2026 19:09:19 -0800 Subject: [PATCH] Add docs/ directory with blumeops zk cards Move 21 blumeops-tagged zettelkasten cards from ~/code/personal/zk/ to docs/ in this repository. These files are symlinked back into the zk at ~/code/personal/zk/blumeops for seamless obsidian.nvim integration. This enables: - Git-managed documentation in the blumeops repo - Preserved wiki links between blumeops docs - obsidian-sync isolation (docs don't sync to other devices) - Direct editing via obsidian.nvim with the blumeops workspace Also updates zk-docs mise task to read from local docs/ directory. Co-Authored-By: Claude Opus 4.5 --- docs/1767747119-YCPO.md | 247 +++++++++++++++++++++++++++++++++++++++ docs/1768246525-RVRY.md | 136 +++++++++++++++++++++ docs/1768283761-TRXN.md | 95 +++++++++++++++ docs/1768457769-LOCK.md | 149 +++++++++++++++++++++++ docs/1768506761-GHUW.md | 167 ++++++++++++++++++++++++++ docs/1768506761-XGYX.md | 82 +++++++++++++ docs/argocd.md | 140 ++++++++++++++++++++++ docs/borgmatic.md | 176 ++++++++++++++++++++++++++++ docs/external-secrets.md | 75 ++++++++++++ docs/grafana.md | 58 +++++++++ docs/indri.md | 65 +++++++++++ docs/jellyfin.md | 90 ++++++++++++++ docs/kiwix.md | 103 ++++++++++++++++ docs/miniflux.md | 83 +++++++++++++ docs/minikube.md | 137 ++++++++++++++++++++++ docs/navidrome.md | 80 +++++++++++++ docs/postgresql.md | 131 +++++++++++++++++++++ docs/pulumi.md | 73 ++++++++++++ docs/teslamate.md | 113 ++++++++++++++++++ docs/transmission.md | 100 ++++++++++++++++ docs/zot.md | 112 ++++++++++++++++++ mise-tasks/zk-docs | 7 +- 22 files changed, 2416 insertions(+), 3 deletions(-) create mode 100644 docs/1767747119-YCPO.md create mode 100644 docs/1768246525-RVRY.md create mode 100644 docs/1768283761-TRXN.md create mode 100644 docs/1768457769-LOCK.md create mode 100644 docs/1768506761-GHUW.md create mode 100644 docs/1768506761-XGYX.md create mode 100644 docs/argocd.md create mode 100644 docs/borgmatic.md create mode 100644 docs/external-secrets.md create mode 100644 docs/grafana.md create mode 100644 docs/indri.md create mode 100644 docs/jellyfin.md create mode 100644 docs/kiwix.md create mode 100644 docs/miniflux.md create mode 100644 docs/minikube.md create mode 100644 docs/navidrome.md create mode 100644 docs/postgresql.md create mode 100644 docs/pulumi.md create mode 100644 docs/teslamate.md create mode 100644 docs/transmission.md create mode 100644 docs/zot.md diff --git a/docs/1767747119-YCPO.md b/docs/1767747119-YCPO.md new file mode 100644 index 0000000..5321c89 --- /dev/null +++ b/docs/1767747119-YCPO.md @@ -0,0 +1,247 @@ +--- +id: 1767747119-YCPO +aliases: + - blumeops + - BlumeOps +tags: + - blumeops +--- + +BlumeOps, aka Blue Mops, refers to my own personal computing operations stack. + +Source code: https://forge.ops.eblu.me/eblume/blumeops (mirrored to https://github.com/eblume/blumeops) + +# Infrastructure + +| Host | Description | Notes | +|----------------------------------|--------------------------|----------------------------------------------------| +| **[[indri|Indri]]** | Mac Mini M1, 2020 | Primary server, 2TB internal disk | +| **[Sifaka](https://nas.ops.eblu.me)** | Synology NAS | 10.9TB RAID 5, backup target | +| **Gilbert** | 13" MacBook Air M4, 2025 | Primary workstation | +| **Mouse** | 13" MacBook Air M2 | Allison's laptop | +| **[UniFi](https://192.168.1.1)** | UniFi Express 7 | Home WiFi network ([cloud](https://unifi.ui.com)) | +| **Dwarf** | iPad Air | Employer-provided, off tailnet | + +All devices are connected via [Tailscale](https://login.tailscale.com/) tailnet `tail8d86e.ts.net`. + +## Tailscale Access Control + +ACLs are managed via Pulumi in `pulumi/policy.hujson`. See [[pulumi]] for deployment commands. + +**Important lesson learned:** +- Don't tag user-owned devices (like gilbert) - tagging converts them to "tagged devices" which lose user identity and break user-based SSH rules + +### Groups + +| Group | Members | Purpose | +|---------------------|--------------------------------------------|----------------------------------| +| `group:allisonflix` | blume.erich@gmail.com, acmdavis@gmail.com | Jellyfin media access | + +### Device Tags + +| Tag | Devices | Purpose | +|------------------|-------------|--------------------------------------------| +| `tag:homelab` | indri | Server infrastructure | +| `tag:nas` | sifaka | Network-attached storage for backups | +| `tag:blumeops` | indri, sifaka | Resources managed by Pulumi IaC | +| `tag:registry` | indri | Container registry access | +| `tag:k8s-api` | indri | Kubernetes API server access | + +### Access Matrix + +| Source | Kiwix | Forge | PyPI | Miniflux | PostgreSQL | NAS | Grafana | Loki | +|--------------------------|-------|-------|------|----------|------------|-----|---------|------| +| `autogroup:admin` | Y | Y | Y | Y | Y | Y | Y | Y | +| `autogroup:member` | Y | Y | Y | Y | Y | - | - | - | +| `tag:homelab` | - | - | - | - | - | Y | - | - | + +Notes: +- **Admins** - full access to all services via `autogroup:admin` +- **Allison** (`acmdavis@gmail.com`) - member services only, no Grafana/Loki/NAS + +### SSH Access + +| Source | Destinations | Auth | +|-------------------------|-----------------|-------------| +| `autogroup:member` | `autogroup:self`| check | +| `autogroup:admin` | `tag:homelab` | check (12h) | +| `autogroup:admin` | `tag:nas` | check (12h) | + +# Services + +Services are accessible via two DNS domains: +- **`*.ops.eblu.me`** - Caddy reverse proxy (reachable from k8s pods, docker containers, and tailnet) +- **`*.tail8d86e.ts.net`** - Tailscale MagicDNS (tailnet clients only, not from k8s/docker) + +## Caddy Services (`*.ops.eblu.me`) + +Caddy proxies to k8s services via their Tailscale endpoints (traffic stays local on indri). +Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs work - use ops.eblu.me for access from pods/containers. + +| Service | URL | Description | Management Log | +|----------------|-----------------------------------|------------------------------------|-----------------| +| **Homepage** | https://go.ops.eblu.me | Service dashboard / start page | — | +| **Forgejo** | https://forge.ops.eblu.me | Git hosting (SSH: port 2222) | [[forgejo]] | +| **Registry** | https://registry.ops.eblu.me | OCI container registry (Zot) | [[zot]] | +| **Sifaka NAS** | https://nas.ops.eblu.me | Synology NAS dashboard | — | +| **Grafana** | https://grafana.ops.eblu.me | Dashboards & observability (k8s) | [[grafana]] | +| **ArgoCD** | https://argocd.ops.eblu.me | GitOps continuous delivery (k8s) | [[argocd]] | +| **Prometheus** | https://prometheus.ops.eblu.me | Metrics collection (k8s) | [[prometheus]] | +| **Loki** | https://loki.ops.eblu.me | Log aggregation (k8s) | [[loki]] | +| **Miniflux** | https://feed.ops.eblu.me | RSS/Atom feed reader (k8s) | [[miniflux]] | +| **PyPI** | https://pypi.ops.eblu.me | PyPI caching proxy (devpi, k8s) | [[pypi]] | +| **Kiwix** | https://kiwix.ops.eblu.me | Offline Wikipedia & ZIM (k8s) | [[argocd]] | +| **Torrent** | https://torrent.ops.eblu.me | BitTorrent daemon web UI (k8s) | [[argocd]] | +| **TeslaMate** | https://tesla.ops.eblu.me | Tesla data logger (k8s) | [[teslamate]] | +| **Immich** | https://photos.ops.eblu.me | Photo management (k8s Helm, CNPG) | [[argocd]] | +| **DJ** | https://dj.ops.eblu.me | Music streaming server (Navidrome) | [[navidrome]] | +| **PostgreSQL** | pg.ops.eblu.me:5432 | Database server (k8s CloudNativePG)| [[postgresql]] | + +## Tailscale-Only Services (`*.tail8d86e.ts.net`) + +These services are only accessible via Tailscale (not from k8s pods/containers): + +| Service | URL | Description | Management Log | +|----------------|-----------------------------------|------------------------------------|-----------------| +| **Kubernetes** | https://k8s.tail8d86e.ts.net | Minikube API (TCP passthrough) | [[minikube]] | +| **Jellyfin** | https://jellyfin.ops.eblu.me | Media server (VideoToolbox HW) | [[jellyfin]] | + +Supporting services (not directly user-facing): + +| Service | Description | Management Log | +|---------------------|---------------------------------------|------------------| +| **Alloy (indri)** | Metrics & logs collector (indri host) | [[alloy]] | +| **Alloy (k8s)** | Pod log collection & service probes | [[alloy]] | +| **Kube-state-metrics** | K8s resource metrics (pods, deployments) | [[prometheus]] | +| **Borgmatic** | Daily backups to Sifaka NAS (2:00 AM) | [[borgmatic]] | + +## Port Map (Indri) + +| Port | Service | Protocol | Binding | Notes | +|-------|---------------|----------|-------------|--------------------------------------------| +| 443 | Caddy | HTTPS | 0.0.0.0 | Reverse proxy for `*.ops.eblu.me` | +| 2222 | Caddy L4 | TCP | 0.0.0.0 | SSH proxy → Forgejo (localhost:2200) | +| 5432 | Caddy L4 | TCP | 0.0.0.0 | PostgreSQL proxy → k8s pg | +| 2200 | Forgejo SSH | TCP | localhost | Built-in SSH server | +| 3001 | Forgejo | HTTP | localhost | Web UI (proxied by Caddy) | +| 5050 | Zot | HTTP | localhost | Registry API (proxied by Caddy) | +| 8096 | Jellyfin | HTTP | localhost | Media server (proxied by Caddy) | +| 44491 | K8s API | HTTPS | 0.0.0.0 | Minikube API server (via Tailscale k8s.*) | + +# Service Management + +## Pulumi (Tailnet IaC) + +Tailnet-wide configuration (ACLs, tags, DNS) is managed via Pulumi. See [[pulumi]] for details. + +```bash +mise run tailnet-preview # preview ACL changes +mise run tailnet-up # apply ACL changes +``` + +Edit `pulumi/policy.hujson` to modify ACLs or add new tags. + +## Ansible + +Services on Indri are managed via ansible. Playbooks live in the `ansible/` directory of the blumeops repo: + +```bash +mise run provision-indri # runs ansible/playbooks/indri.yml +mise run indri-services-check # checks health of all services +``` + +Run with `--check --diff` first to preview changes, or target specific services: + +```bash +mise run provision-indri -- --check --diff # dry run +mise run provision-indri -- --tags alloy # only alloy +mise run provision-indri -- --tags zot,borgmatic # multiple tags +``` + +## Adding a New Service + +### Indri Services (via Caddy) + +For services running directly on indri that need to be accessible from k8s pods: + +1. Host service locally on localhost (e.g., localhost:3000) +2. Add service to `ansible/roles/caddy/defaults/main.yml` under `caddy_services` +3. Run `mise run provision-indri -- --tags caddy` +4. Add backup entry in borgmatic role if needed + +DNS is handled by a wildcard record (`*.ops.eblu.me` → indri's Tailscale IP) managed via Pulumi in `pulumi/gandi/`. + +Access via `https://foo.ops.eblu.me`. + +### K8s Services (via Tailscale Ingress) + +For services running in minikube: + +1. Create Kubernetes manifests in `argocd/manifests//` +2. Add ArgoCD Application in `argocd/apps/.yaml` +3. Add Tailscale Ingress annotation for `*.tail8d86e.ts.net` hostname +4. Add Homepage annotations to the Ingress for dashboard discovery (see below) +5. Add Caddy proxy entry in `ansible/roles/caddy/defaults/main.yml` +6. Sync via ArgoCD: `argocd app sync ` + +Access via `https://foo.ops.eblu.me` (preferred) or `https://foo.tail8d86e.ts.net`. + +**Note:** K8s services using Tailscale Ingress are NOT accessible from other k8s pods or docker containers. Use Caddy (`*.ops.eblu.me`) if pod-to-service communication is needed. + +**Homepage annotations** for automatic dashboard discovery: +```yaml +annotations: + gethomepage.dev/enabled: "true" + gethomepage.dev/name: "My Service" + gethomepage.dev/group: "Apps" + gethomepage.dev/icon: "myservice.png" + gethomepage.dev/description: "Short description" + gethomepage.dev/href: "https://myservice.ops.eblu.me" + gethomepage.dev/pod-selector: "app=myservice" +``` + +Icons use [Dashboard Icons](https://github.com/walkxcode/dashboard-icons) format (e.g., `grafana.png`, `prometheus.png`). The `pod-selector` annotation enables pod status badges on the dashboard. + +## Secrets Management + +Kubernetes secrets are managed via [[external-secrets|External Secrets Operator]], which syncs from 1Password via 1Password Connect. + +To add a secret to a k8s service: +1. Ensure the 1Password item exists in the `blumeops` vault +2. Create an `ExternalSecret` manifest in the service's directory +3. Reference the `onepassword-blumeops` ClusterSecretStore +4. Sync via ArgoCD + +See [[external-secrets]] for detailed usage and bootstrap instructions. + +# Notes + +## Go DNS Resolution on macOS + +**Important lesson learned (2026-01-22):** +Go programs built with `CGO_ENABLED=0` (pure Go) use a DNS resolver that reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files. This breaks Tailscale MagicDNS resolution. + +**Solution:** Build Go programs with `CGO_ENABLED=1` to use the macOS native resolver. This is why [[alloy|Grafana Alloy]] is built from source rather than using the Homebrew bottle. + +## Remote Kubernetes Access (from Gilbert) + +The minikube cluster on indri is accessible from gilbert via Tailscale service. +Cluster was created with `--apiserver-names=k8s.tail8d86e.ts.net,indri --listen-address=0.0.0.0`. +API server exposed at `https://k8s.tail8d86e.ts.net` via TCP passthrough (preserves mTLS). + +**Fish abbreviations** (in `~/.config/fish/config.fish`): +- `ki` -> `kubectl --context=minikube-indri` +- `k9i` -> `k9s --context=minikube-indri` +- `k9` -> `k9s` + +```bash +# Quick access via abbreviations +ki get nodes +k9i + +# Or explicitly set context +kubectl config use-context minikube-indri +kubectl get nodes +``` + +Credentials are stored in 1Password and fetched via exec credential plugin. See [[minikube]] for details. diff --git a/docs/1768246525-RVRY.md b/docs/1768246525-RVRY.md new file mode 100644 index 0000000..f535b34 --- /dev/null +++ b/docs/1768246525-RVRY.md @@ -0,0 +1,136 @@ +--- +id: 1768246525-RVRY +aliases: + - forgejo + - forge +tags: + - blumeops + - forgejo + - git + - scm + - forge +--- + +# Mon Jan 12 11:35 + +```fish +❯ brew install forgejo +❯ brew --prefix forgejo +/opt/homebrew/opt/forgejo +❯ brew services start forgejo +==> Successfully started `forgejo` (label: homebrew.mxcl.forgejo) +``` + +From the service definition I can see that this runs as: + +```bash +/opt/homebrew/opt/forgejo/bin/forgejo web --work-path /opt/homebrew/var/forgejo > /opt/homebrew/var/log/forgejo.log 2> /opt/homebrew/var/log/forgejo.log +``` +It sounds from the docs like this means the config file should live at: +``` +/opt/homebrew/var/forgejo/custom/conf/app.ini +``` +Ah, based on the logs, it looks like forgejo has picked port 3000 which is used by grafana: +``` +❯ lsof -nP -iTCP:3000 -sTCP:LISTEN +COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME +grafana 1530 erichblume 15u IPv6 0x4acfad8b21dcb063 0t0 TCP *:3000 (LISTEN) +``` +Ok I've set a basic config for port 3001, and then gone through the basic app setup. Looks like it's working! Not sure how SSH works yet though. Let's get this service registered. + +Ok so the next issue is that I want to use ssh as my primary git interface, and +I want that to look to users like I'm using port 22 but I want to host it on +indri which has its own separate ssh setup. Hmm. Let's tell forgejo to use port +2200. Ah perfect, we can set SSH_PORT to 22 and SSH_LISTEN_PORT to 2200. + +Hmm, let's stop running this as me and run as a new user, 'forgejo'. +```fish +sudo sysadminctl -addUser forgejo -system -shell /usr/bin/false +sudo chown -R forgejo:staff /opt/homebrew/var/forgejo +``` +Ok, I think I need to switch all my services on this host over to a services file. + +Wow, missing from the above is like 4 hours of deep diving in to the particulars of tailscale service definition hosting. In the end, I never got a services file to work - and yes, I did remember to advertise! Adding to the complexity is that I didn't discover until the end that you can't do "hairpinning", ie you CANNOT use the tailnet service name from the host doing that hosting. I probably had it fixed at some point hours ago and ruled it out because I didn't know about the hairpinning issue. So anyway... what ended up working was to just use the cli: +```fish +tailscale serve --service="svc:forge" --tcp=22 tcp://localhost:2200 +tailscale serve --service="svc:forge" --https=443 http://localhost:3001 +``` +That's it. Nothing else needed, worked right away. Sheesh. (Ok there was also a +solid hour spent on permission issues... I honestly don't know how it's working +now, as there is now a `forgejo` user and the config says to use it but the +files are all owned by `erichblume:staff` but with group permissions set... in +any case, it friggin' works. So I'm happy. + +# Configuration (Ansible-Managed) + +As of 2026-01-23, the `app.ini` is managed by ansible: +- Template: `ansible/roles/forgejo/templates/app.ini.j2` +- Secrets fetched from 1Password in playbook pre_tasks +- Secrets item: "Forgejo Secrets" in blumeops vault (fields: `lfs-jwt-secret`, `internal-token`, `oauth2-jwt-secret`, `runner_reg`) + +Deploy config changes: +```bash +mise run provision-indri -- --tags forgejo +``` + +# Forgejo Actions (CI/CD) + +## Runner (k8s) + +The Forgejo runner runs in Kubernetes with Docker-in-Docker (DinD) for container builds. + +**Architecture:** +- Runner daemon + DinD sidecar in a single pod +- Jobs execute in containers using the `k8s` label +- DinD exposes Docker API on `tcp://127.0.0.1:2375` +- Pods reach `*.ops.eblu.me` services via Caddy reverse proxy + +**Components:** +- ArgoCD app: `argocd/apps/forgejo-runner.yaml` +- Manifests: `argocd/manifests/forgejo-runner/` +- Job image: `registry.ops.eblu.me/blumeops/forgejo-runner` (Node.js + Docker CLI) +- Job image source: `containers/forgejo-runner/` + +**Deployment:** +```bash +# Apply secret (contains runner token from 1Password) +op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f - + +# Sync via ArgoCD +argocd app sync forgejo-runner +``` + +**View logs:** +```bash +kubectl --context=minikube-indri logs -n forgejo-runner -l app=forgejo-runner -c runner +``` + +## Container Build Workflow + +Container images are built via `.forgejo/workflows/build-container.yaml`, triggered by tags matching `-v`. + +**Release a container:** +```bash +mise run container-list # See available containers +mise run container-tag-and-release nettest v1.0.0 # Tag and trigger build +``` + +**Test container** (`containers/nettest/`): Network connectivity test for debugging CI/CD. + +## Workflows + +Workflows live in `.forgejo/workflows/` (not `.github/workflows/`). + +**Important**: Use `github.*` context variables, not `gitea.*`. Forgejo supports both at runtime, but: +1. The Forgejo web UI schema validator only recognizes `github.*` +2. `actionlint` pre-commit hook validates workflows locally (catches errors before push) +3. Pass untrusted inputs (like `github.head_ref`) through env vars for security + +## Runner Token + +Stored in 1Password "Forgejo Secrets" item, field `runner_reg`. + +To create a new token: +1. Go to https://forge.ops.eblu.me/admin/actions/runners +2. Click "Create new Runner" +3. Copy the token and update 1Password diff --git a/docs/1768283761-TRXN.md b/docs/1768283761-TRXN.md new file mode 100644 index 0000000..485a2dc --- /dev/null +++ b/docs/1768283761-TRXN.md @@ -0,0 +1,95 @@ +--- +id: 1768283761-TRXN +aliases: + - prometheus +tags: + - blumeops +--- + +# Prometheus Management Log + +Prometheus provides metrics storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure, running in Kubernetes (minikube on indri). + +## Service Details + +- URL: https://prometheus.tail8d86e.ts.net +- Namespace: `monitoring` +- Image: `prom/prometheus:v3.2.1` +- ArgoCD app: `prometheus` +- Storage: 50Gi PVC + +## Data Sources + +### Remote Write (from Alloy) +- Indri system metrics via [[alloy|Grafana Alloy]] remote_write +- Textfile metrics: minikube, borgmatic, zot, jellyfin + +### Scrape Targets +- `sifaka:9100` - Synology NAS (node_exporter in Docker) +- `cnpg-metrics.tail8d86e.ts.net:9187` - CloudNativePG PostgreSQL metrics +- `kube-state-metrics.monitoring.svc:8080` - Kubernetes resource metrics (pods, deployments, etc.) + +## Useful Commands + +```bash +# View logs +kubectl --context=minikube-indri -n monitoring logs -f prometheus-0 + +# Check targets +curl -s https://prometheus.tail8d86e.ts.net/api/v1/targets | jq '.data.activeTargets[].scrapeUrl' + +# Sync from ArgoCD +argocd app sync prometheus +``` + +## ArgoCD Management + +Prometheus is deployed via ArgoCD from `argocd/manifests/prometheus/`: +- `statefulset.yaml` - StatefulSet with 50Gi PVC +- `configmap.yaml` - Prometheus configuration +- `service.yaml` - ClusterIP service +- `ingress-tailscale.yaml` - Tailscale Ingress + +## Log + +### Wed Jan 22 2026 (observability cleanup) + +- Added kube-state-metrics scrape target for k8s resource metrics +- Enhanced Minikube dashboard with namespace filtering and resource usage panels +- Uses `kube_pod_info`, `kube_pod_container_resource_requests`, etc. + +### Wed Jan 22 2026 (later) + +- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet +- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` +- Remote write endpoint now at k8s service, Alloy updated to push there +- Retired ansible prometheus role from indri +- Added ACL grant for `tag:homelab` → `tag:k8s` on port 443 for Alloy access + +### Wed Jan 22 2026 + +Added CNPG PostgreSQL metrics scraping. The CloudNativePG operator exposes Prometheus metrics on port 9187. Exposed via Tailscale at `cnpg-metrics.tail8d86e.ts.net:9187` and added to scrape_configs as job `cnpg-postgres`. + +### Wed Jan 15 2026 + +Prometheus now accepts metrics via remote_write from [[alloy|Grafana Alloy]]. The `--web.enable-remote-write-receiver` flag was added to enable this. + +Indri metrics are no longer scraped - they're pushed by Alloy. Sifaka still uses traditional scraping via node_exporter running in Docker on the Synology. + +### Mon Jan 13 2026 + +Prometheus is now managed via ansible in [[1767747119-YCPO|blumeops]]. Configuration files are templated from the ansible role. + +### Mon Jan 12 2026 21:56 + +Prometheus was stood up about a week ago at this point. I am currently renaming +`localhost` to `indri` in the scrape_configs. While I'm here I'm going to see +if I can add Synology stats. + +I'm adding Container Manager to Sifaka now. I should probably have a Sifaka +management log, but not yet. Downloaded prom/node-exporter and made a container +for it. Using the latest tag because I'm nasty. + +Done. Adding to scrape configs. + +Ok, it didn't like the indri hostname. Could probably fix somehow with either magicdns or /etc/hosts but for now, I'm using `relabel_configs`. This is working. Gotta go to bed. diff --git a/docs/1768457769-LOCK.md b/docs/1768457769-LOCK.md new file mode 100644 index 0000000..87ec9fd --- /dev/null +++ b/docs/1768457769-LOCK.md @@ -0,0 +1,149 @@ +--- +id: 1768457769-LOCK +aliases: + - pypi + - devpi +tags: + - blumeops +--- + +# PyPI / devpi Management Log + +PyPI caching proxy running in Kubernetes (minikube on indri) via devpi-server. + +## Service Details + +- URL: https://pypi.tail8d86e.ts.net +- Namespace: devpi +- Image: registry.tail8d86e.ts.net/blumeops/devpi:latest (custom image with devpi-server + devpi-web) +- ArgoCD app: devpi +- Storage: 50Gi PVC + +## Useful Commands + +```bash +# View logs +kubectl --context=minikube-indri -n devpi logs -f statefulset/devpi + +# Restart pod +kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi + +# Check health +curl https://pypi.tail8d86e.ts.net/+api + +# Sync from ArgoCD +argocd app sync devpi +``` + +## ArgoCD Management + +Devpi is deployed via ArgoCD from `argocd/manifests/devpi/`: +- `statefulset.yaml` - StatefulSet with 50Gi PVC +- `service.yaml` - ClusterIP service +- `ingress-tailscale.yaml` - Tailscale Ingress for external access +- `Dockerfile` - Custom image with startup script +- `start.sh` - Auto-initialization script + +## Users and Indices + +### Structure + +- `root/pypi` - PyPI mirror/cache (auto-created) +- `eblume/dev` - Private packages index (inherits from root/pypi) + +### Creating a User and Index + +```bash +# Login as root +uvx devpi use https://pypi.tail8d86e.ts.net +uvx devpi login root + +# Create user (prompts for password - store in 1Password) +uvx devpi user -c USERNAME email=EMAIL + +# Create index inheriting from PyPI mirror +uvx devpi index -c USERNAME/dev bases=root/pypi +``` + +### Uploading Packages (with uv) + +```bash +# Store credentials (one-time, prompts for username/password) +uv auth login https://pypi.tail8d86e.ts.net + +# Build and publish +cd ~/code/personal/your-package +uv build +uv publish --publish-url https://pypi.tail8d86e.ts.net/eblume/dev/ +``` + +Note: The "trusted publishing failed" warning is expected (devpi doesn't support OIDC). + +### Uploading Packages (with devpi-client) + +```bash +# Login as the user +uvx devpi login USERNAME + +# Use the index +uvx devpi use eblume/dev + +# Upload from project directory +uvx devpi upload +``` + +## Client Configuration + +On workstations, configure pip to use the proxy. + +**pip.conf** (`~/.config/pip/pip.conf`): +```ini +[global] +index-url = https://pypi.tail8d86e.ts.net/root/pypi/+simple/ +trusted-host = pypi.tail8d86e.ts.net +``` + +After creating/editing, track with chezmoi: +```bash +chezmoi add ~/.config/pip/pip.conf +``` + +## Credentials + +- Root password stored in 1Password (blumeops vault) +- Injected into k8s via `devpi-root` secret from `secret-root.yaml.tpl` + +## Backup + +Private packages (`eblume/dev` index) are stored in the devpi PVC. The PyPI mirror cache (`root/pypi`) is not backed up as it can be re-fetched. + +**TODO**: Add devpi PVC backup to borgmatic once k8s volume backup strategy is established. + +## Related + +- [[1767747119-YCPO|BlumeOps project card]] +- [[argocd|ArgoCD]] for deployment +- [[minikube|Kubernetes cluster]] + +## Log + +### Mon Jan 20 2026 + +- **Migrated to Kubernetes** (Phase 5 of k8s migration) +- Custom container image with devpi-server + devpi-web + auto-init startup script +- StatefulSet with 50Gi PVC for data persistence +- Tailscale Ingress at `pypi.tail8d86e.ts.net` +- Root password from 1Password secret, auto-initialized on first run +- Verified pip caching proxy and mcquack package upload +- **Key learnings:** + - Minikube CRI-O can't resolve Tailscale hostnames - added registry mirror config + - devpi-web Whoosh indexer needs ~2Gi during initial PyPI index build + - Kubernetes auto-sets `DEVPI_PORT` for service discovery - renamed to `DEVPI_LISTEN_PORT` +- Removed LaunchAgent from indri, cleared Tailscale serve entry + +### Previous (indri era) + +- Initial setup with devpi on indri via mcquack LaunchAgent +- Connected via Tailscale at pypi.tail8d86e.ts.net +- Created eblume/dev index for private packages +- Metrics collection via textfile exporter diff --git a/docs/1768506761-GHUW.md b/docs/1768506761-GHUW.md new file mode 100644 index 0000000..0702d2c --- /dev/null +++ b/docs/1768506761-GHUW.md @@ -0,0 +1,167 @@ +--- +id: 1768506761-GHUW +aliases: + - alloy + - grafana-alloy +tags: + - blumeops +--- + +# Grafana Alloy Management Log + +Grafana Alloy is a unified observability collector with two deployments: +1. **Indri (host)** - System metrics and service logs from macOS host +2. **Kubernetes (DaemonSet)** - Automatic pod log collection and service health probes + +## Service Details + +- Binary: `~/.local/bin/alloy` (built from source with CGO_ENABLED=1) +- Config: `~/.config/grafana-alloy/config.alloy` +- Data: `~/.local/share/grafana-alloy/` +- Logs: `~/Library/Logs/mcquack.alloy.{out,err}.log` +- Managed via: mcquack LaunchAgent (`mcquack.eblume.alloy`) + +**Why built from source?** The Homebrew bottle is built with `CGO_ENABLED=0`, which uses Go's pure DNS resolver. This resolver reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files, breaking Tailscale MagicDNS hostname resolution. Building with `CGO_ENABLED=1` uses the macOS native resolver. + +## What Alloy Collects + +### Metrics +- System metrics via `prometheus.exporter.unix` (same metrics as node_exporter) +- Textfile collector reads from `/opt/homebrew/var/node_exporter/textfile/` + - `minikube.prom` - Minikube cluster status + - `borgmatic.prom` - Backup status metrics + - `zot.prom` - Container registry metrics + - `jellyfin.prom` - Jellyfin media server metrics +- Zot registry metrics scraped from `http://localhost:5050/metrics` +- Metrics pushed to Prometheus (k8s) via remote_write at `https://prometheus.tail8d86e.ts.net/api/v1/write` + +### Logs +Collects logs from all services on Indri: + +**Brew services:** +- forgejo +- tailscale + +**mcquack LaunchAgents:** +- alloy (stdout/stderr) +- borgmatic (stdout/stderr) +- zot (stdout/stderr) +- jellyfin (stdout/stderr) + +Logs pushed to Loki (k8s) at `https://loki.tail8d86e.ts.net/loki/api/v1/push`. + +## Useful Commands + +```bash +# Check service status +ssh indri 'launchctl list | grep alloy' + +# View alloy logs +ssh indri 'tail -f ~/Library/Logs/mcquack.alloy.err.log' + +# Restart service +ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist && launchctl load ~/Library/LaunchAgents/mcquack.eblume.alloy.plist' +``` + +## Building from Source + +Alloy must be built with CGO to use macOS native DNS resolver (required for Tailscale MagicDNS): + +```bash +# On gilbert (dev workstation): +git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy +cd ~/code/3rd/alloy && mise use go@1.25 node yarn +mise x -- make alloy +scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy +``` + +Then run ansible to deploy the config and LaunchAgent. + +## Ansible Management (Indri) + +Alloy on Indri is managed via ansible in [[1767747119-YCPO|blumeops]]. + +```bash +mise run provision-indri -- --tags alloy +``` + +## Kubernetes Alloy (alloy-k8s) + +A separate Alloy DaemonSet runs in k8s for: +- **Automatic pod log collection** - discovers and collects logs from all pods +- **Service health probes** - HTTP blackbox probes for k8s services + +### Service Details (k8s) + +- Namespace: `alloy` +- Image: `grafana/alloy:v1.8.2` +- ArgoCD app: `alloy-k8s` +- Manifests: `argocd/manifests/alloy-k8s/` + +### What k8s Alloy Collects + +**Pod logs (automatic discovery):** +- All pods in all namespaces via `loki.source.kubernetes` +- Labels: namespace, pod, container, node + +**Service health probes:** +- miniflux, kiwix, transmission, devpi, argocd +- Metrics: `probe_success`, `probe_duration_seconds` +- Labels: `job="integrations/blackbox/"` + +### Useful Commands (k8s Alloy) + +```bash +# View alloy-k8s logs +kubectl --context=minikube-indri -n alloy logs -f daemonset/alloy + +# Check running config +kubectl --context=minikube-indri -n alloy get configmap alloy-config -o yaml + +# Sync from ArgoCD +argocd app sync alloy-k8s +``` + +## Log + +### Wed Jan 22 2026 (later) + +- **Added Alloy k8s DaemonSet** for automatic pod log collection +- Logs from all k8s pods now forwarded to Loki with automatic discovery +- Added service health probes for miniflux, kiwix, transmission, devpi, argocd +- New "Services Health" Grafana dashboard shows probe metrics +- Deleted stale textfile metrics (`devpi.prom`, `transmission.prom`) from indri +- Deleted stale data directories (`/opt/homebrew/var/loki`, `/opt/homebrew/var/prometheus`) + +### Wed Jan 22 2026 + +- **Rebuilt from source with CGO_ENABLED=1** - required for Tailscale MagicDNS resolution +- Migrated from Homebrew to mcquack LaunchAgent management +- Updated remote_write to push to k8s Prometheus at `prometheus.tail8d86e.ts.net` +- Updated log push to k8s Loki at `loki.tail8d86e.ts.net` +- Removed prometheus/loki log collection (now running in k8s) +- Binary now at `~/.local/bin/alloy`, config at `~/.config/grafana-alloy/` +- Added build instructions to ansible role defaults + +### Mon Jan 20 2026 + +- Removed devpi log collection (devpi migrated to k8s) +- Removed devpi.prom textfile collection (metrics role retired) +- Removed grafana log collection (grafana migrated to k8s in P2) + +### Wed Jan 15 2026 + +- Initial setup replacing node_exporter +- Configured metrics push via remote_write to Prometheus +- Configured log collection for all services, forwarding to Loki + +### Thu Jan 30 2026 + +- Removed Plex log and metrics collection (replaced by Jellyfin) +- Added Jellyfin log collection via mcquack LaunchAgent logs +- Added jellyfin.prom textfile metrics + +### Wed Jan 15 2026 (later) + +- Added Plex Media Server log collection (removed 2026-01-30) +- Added plex.prom metrics from plex_metrics role (removed 2026-01-30) diff --git a/docs/1768506761-XGYX.md b/docs/1768506761-XGYX.md new file mode 100644 index 0000000..c94e878 --- /dev/null +++ b/docs/1768506761-XGYX.md @@ -0,0 +1,82 @@ +--- +id: 1768506761-XGYX +aliases: + - loki +tags: + - blumeops +--- + +# Loki Management Log + +Loki is a log aggregation system running in Kubernetes (minikube on indri), providing log storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure. + +## Service Details + +- URL: https://loki.tail8d86e.ts.net +- Namespace: `monitoring` +- Image: `grafana/loki:3.4.2` +- ArgoCD app: `loki` +- Storage: 50Gi PVC +- Retention: 31 days + +## Architecture + +- Single-node deployment with filesystem storage +- TSDB index with 24h period +- Logs collected by [[alloy|Grafana Alloy]] and pushed via Loki API +- Queried via Grafana using the Loki datasource + +## Useful Commands + +```bash +# View logs +kubectl --context=minikube-indri -n monitoring logs -f loki-0 + +# Check if Loki is ready +curl -s https://loki.tail8d86e.ts.net/ready + +# Sync from ArgoCD +argocd app sync loki +``` + +## Grafana Integration + +Loki is configured as a datasource in Grafana. To explore logs: + +1. Go to https://grafana.tail8d86e.ts.net/explore +2. Select "Loki" datasource +3. Use LogQL queries: + - `{service="forgejo"}` - all forgejo logs + - `{service="borgmatic", stream="stderr"}` - borgmatic errors + - `{host="indri"} |= "error"` - all logs containing "error" + +## ArgoCD Management + +Loki is deployed via ArgoCD from `argocd/manifests/loki/`: +- `statefulset.yaml` - StatefulSet with 50Gi PVC +- `configmap.yaml` - Loki configuration +- `service.yaml` - ClusterIP service +- `ingress-tailscale.yaml` - Tailscale Ingress + +## Log + +### Thu Jan 23 2026 + +- Suppressed noisy `v1 Endpoints is deprecated` warning from minikube storage-provisioner ([upstream issue](https://github.com/kubernetes/minikube/issues/21009)) +- Added JSON field extraction for zot compatibility (`message` vs `msg`) +- Removed logfmt parsing stage - `stage.match` selectors don't prevent Alloy from logging internal decode errors, and most structured logs use JSON anyway +- Fixed devpi dashboard JSON escaping + +### Wed Jan 22 2026 + +- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet +- Exposed via Tailscale Ingress at `loki.tail8d86e.ts.net` +- Alloy updated to push logs to k8s endpoint +- Retired ansible loki role from indri + +### Wed Jan 15 2026 + +- Initial setup with single-node filesystem storage +- Configured 31-day retention with compactor +- Integrated with Grafana as datasource +- Logs collected via Alloy from all services diff --git a/docs/argocd.md b/docs/argocd.md new file mode 100644 index 0000000..d22c2fb --- /dev/null +++ b/docs/argocd.md @@ -0,0 +1,140 @@ +--- +id: argocd +aliases: + - argocd + - argo-cd +tags: + - blumeops +--- + +# ArgoCD Management Log + +ArgoCD provides GitOps continuous delivery for the [[minikube]] cluster on Indri. + +## Service Details + +- URL: https://argocd.tail8d86e.ts.net +- Namespace: `argocd` +- Git source: `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git` +- Manifests path: `argocd/` + +## Sync Policy Decision + +**Choice**: Manual sync for workload apps, auto-sync only for app-of-apps. + +**Rationale** (decided 2026-01-19 during Phase 1 migration): +- During migration, we want explicit control over what gets deployed +- Auto-sync could deploy broken changes while we're still learning the stack +- The app-of-apps (`apps`) auto-syncs so new Application manifests appear automatically +- But those Applications have manual sync, so actual workload changes require `argocd app sync ` + +**Pattern**: +| Application | Sync Policy | Why | +|-------------|-------------|-----| +| `apps` | Automated | Picks up new Application manifests from git | +| `argocd` | Manual | Self-management changes should be deliberate | +| `tailscale-operator` | Manual | Infrastructure changes need review | +| `cloudnative-pg` | Manual | Operator upgrades need care | +| `blumeops-pg` | Manual | Database changes are sensitive | +| `grafana` | Manual | Observability stack changes need review | +| `grafana-config` | Manual | Dashboard changes should be deliberate | +| `miniflux` | Manual | Application changes need review | +| `devpi` | Manual | PyPI proxy changes need review | + +**Future consideration**: After migration stabilizes, consider enabling auto-sync for stable workloads. Keep manual sync for infrastructure (operators, databases). + +## CLI Access + +```bash +# Login (uses Tailscale for network, prompts for password) +argocd login argocd.tail8d86e.ts.net --grpc-web + +# List apps +argocd app list + +# Sync an app +argocd app sync + +# Check diff before sync +argocd app diff + +# Get app details +argocd app get +``` + +## Applications + +| App | Path | Description | +|-----|------|-------------| +| `apps` | `argocd/apps/` | App-of-apps root | +| `argocd` | `argocd/manifests/argocd/` | ArgoCD self-management | +| `tailscale-operator` | `argocd/manifests/tailscale-operator/` | Tailscale k8s operator | +| `cloudnative-pg` | Helm chart (forge mirror) | PostgreSQL operator | +| `blumeops-pg` | `argocd/manifests/databases/` | PostgreSQL cluster | +| `prometheus` | `argocd/manifests/prometheus/` | Metrics storage | +| `loki` | `argocd/manifests/loki/` | Log aggregation | +| `grafana` | Helm chart (forge mirror) | Grafana dashboards | +| `grafana-config` | `argocd/manifests/grafana-config/` | Grafana ingress & dashboards | +| `alloy-k8s` | `argocd/manifests/alloy-k8s/` | Pod log collection & service probes | +| `kube-state-metrics` | `argocd/manifests/kube-state-metrics/` | K8s resource metrics | +| `miniflux` | `argocd/manifests/miniflux/` | RSS feed reader | +| `devpi` | `argocd/manifests/devpi/` | PyPI caching proxy | +| `torrent` | `argocd/manifests/torrent/` | BitTorrent daemon | +| `kiwix` | `argocd/manifests/kiwix/` | Offline Wikipedia & ZIM archives | +| `forgejo-runner` | `argocd/manifests/forgejo-runner/` | Forgejo Actions CI runner (host mode) | + +## Credentials + +- Admin password stored in 1Password (updated from initial auto-generated password) +- Git access via deploy key (SSH) stored in 1Password + +## Log + +### 2026-01-23 (CI/CD Bootstrap Phase 1) +- Added `forgejo-runner` - Forgejo Actions CI runner +- Runner uses host mode (jobs run directly in runner container, no Docker needed) +- Labels: `ubuntu-latest`, `ubuntu-22.04` +- Note: Stock runner lacks Node.js, so `actions/checkout@v4` doesn't work - use git clone instead +- See [[forgejo]] for runner token management and workflow examples + +### 2026-01-22 (Observability Cleanup) +- Added `alloy-k8s` - DaemonSet for automatic pod log collection and service health probes +- Added `kube-state-metrics` - provides k8s resource metrics (pod counts, resource requests, etc.) +- Enhanced Minikube dashboard with namespace filtering and resource usage panels +- Added "Services Health" dashboard with probe metrics for all k8s services +- Fixed macOS dashboard instance variable to only show macOS hosts +- Cleaned up stale data: removed old textfile metrics and directories from indri +- Removed stale `/opt/homebrew/var/loki` from borgmatic backup sources + +### 2026-01-22 (Phase 7) +- **Migrated Prometheus and Loki to k8s** - completed observability stack migration +- Both now running as StatefulSets with 50Gi PVCs +- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` and `loki.tail8d86e.ts.net` +- Grafana datasources updated to use k8s-internal service URLs +- Alloy rebuilt with CGO for Tailscale DNS resolution, pushes to k8s endpoints +- Retired ansible prometheus and loki roles from indri + +### 2026-01-21 (Phase 6) +- Added torrent (Transmission BitTorrent) to k8s +- Added kiwix (offline Wikipedia & ZIM archives) to k8s +- NFS storage from sifaka for shared torrent/ZIM data + +### 2026-01-20 (Phase 5) +- Added devpi (PyPI caching proxy) to k8s +- Custom container image in zot registry with devpi-server + devpi-web +- StatefulSet with 50Gi PVC for data persistence +- Changed `apps` Application to manual sync (was auto-sync with prune) + +### 2026-01-19 (Phase 2) +- Migrated Grafana from Homebrew/Ansible to Kubernetes +- Helm chart repos now mirrored to forge (cloudnative-pg-charts, grafana-helm-charts) +- SSH credential template (`repo-creds-forge`) for all forge repos +- Added indri SSH host key to ArgoCD known_hosts +- Tailscale service cutover: deleted old svc:grafana from Tailscale admin to free hostname +- Retired ansible grafana role + +### 2026-01-19 (Phase 1) +- Completed Phase 1 deployment +- Decided on manual sync policy for workloads +- Using internal [[forgejo]] as git source (not GitHub mirror) +- Exposed via Tailscale Ingress with Let's Encrypt TLS diff --git a/docs/borgmatic.md b/docs/borgmatic.md new file mode 100644 index 0000000..bfe54bc --- /dev/null +++ b/docs/borgmatic.md @@ -0,0 +1,176 @@ +--- +id: borgmatic +aliases: + - borgmatic + - borg-backup +tags: + - blumeops +--- + +# Borgmatic Management Log + +Borgmatic runs daily backups from Indri to Sifaka NAS using Borg backup. + +## Service Details + +- Installed via: mise (pipx) +- Config: `~/.config/borgmatic/config.yaml` (ansible-managed) +- Schedule: Daily at 2:00 AM via LaunchAgent +- Repository: `/Volumes/backups/borg/` on Sifaka + +## What Gets Backed Up + +**Directories:** +- `~/code/personal/zk` - Zettelkasten (primary) +- `/opt/homebrew/var/forgejo` - Git forge data +- `~/.config/borgmatic` - Borgmatic config itself +- `~/Documents` - Personal documents +- `~/Pictures` - Photos (see note below) + +**Note on iCloud Photos:** macOS Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally. Borgmatic only backs up what's on disk, so iCloud-only photos are NOT backed up. If you need full photo backups via borgmatic, either disable "Optimize Mac Storage" in Photos preferences, or use a tool like osxphotos which forces downloads. See log entry 2026-01-28. + +**Databases:** +- `miniflux` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me) +- `teslamate` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me) + +**Not backed up (by design):** +- ZIM archives in `~/transmission/` - re-downloadable via torrent +- Prometheus metrics - ephemeral data +- Loki logs - ephemeral (now in k8s PVC) +- devpi data - in k8s PVC, backup strategy TBD + +## PostgreSQL Backup + +Borgmatic uses native `postgresql_databases` support to stream `pg_dump` directly to Borg: +- No intermediate files needed +- Database keeps running (no downtime) +- Consistent transactional snapshots +- Uses `borgmatic` user with `pg_read_all_data` role +- Password read from `~/.pgpass` (managed by borgmatic ansible role) +- Uses explicit `pg_dump_command` path (`/opt/homebrew/opt/postgresql@18/bin/pg_dump`) since LaunchAgent doesn't have homebrew in PATH +- Uses explicit `local_path` (`/opt/homebrew/bin/borg`) for same reason + +**Databases backed up:** +- `pg.ops.eblu.me:5432/miniflux` - CloudNativePG cluster in k8s +- `pg.ops.eblu.me:5432/teslamate` - CloudNativePG cluster in k8s + +## Ansible Management + +Borgmatic is fully managed via ansible in [[1767747119-YCPO|blumeops]]: + +```bash +mise run provision-indri -- --tags borgmatic +``` + +The role deploys: +- `~/.config/borgmatic/config.yaml` - Main configuration +- LaunchAgent plist for scheduled runs + +## Useful Commands + +```bash +# List archives +ssh indri 'mise x -- borgmatic list' + +# Extract from latest archive +ssh indri 'mise x -- borgmatic extract --archive latest --path /some/path' + +# Run backup manually +ssh indri 'mise x -- borgmatic create --verbosity 1' + +# Check repository health +ssh indri 'mise x -- borgmatic check' +``` + +## Retention Policy + +- 7 daily backups +- 12 monthly backups +- 1000 yearly backups (effectively forever) + +## Monitoring + +Borgmatic metrics are collected hourly via a script at `~/bin/borgmatic-metrics` and exposed to Prometheus via the node_exporter textfile collector. + +View the Grafana dashboard at: https://grafana.tail8d86e.ts.net (select "Borgmatic Backups" dashboard) + +Metrics include: +- `borgmatic_up` - repository accessibility +- `borgmatic_repo_deduplicated_size_bytes` - actual disk usage +- `borgmatic_last_archive_original_size_bytes` - size of data being backed up +- `borgmatic_last_archive_deduplicated_size_bytes` - new data added per backup +- `borgmatic_archive_count` - number of archives +- `borgmatic_last_archive_timestamp` - when last backup ran + +```bash +# Check metrics file +ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom' + +# Check metrics LaunchAgent status +ssh indri 'launchctl list | grep borgmatic-metrics' +``` + +## Log + +### Tue Jan 28 2026 + +- Investigated massive backup size increase (~69GB deduplicated, ~94GB per archive) +- Root cause: immich-sync role (added Jan 26, removed Jan 28) used osxphotos to export photos +- **Lesson learned:** osxphotos forces Photos.app to download all iCloud originals locally +- Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally +- Before immich-sync: borgmatic was backing up thumbnails (~few GB) +- After immich-sync: borgmatic now has full 42GB of photo originals +- This is actually a bonus - provides redundant photo backup alongside iCloud and Immich +- Retention policy means these photos will be kept in yearly archives essentially forever +- **Future plan:** Once Immich (on sifaka "photos" volume with Synology offsite backup) is fully set up, Pictures may be removed from borgmatic as redundant + +### Thu Jan 23 2026 + +- Note: Forgejo `app.ini` is now managed by ansible (secrets in 1Password) +- `/opt/homebrew/var/forgejo` still backed up for git repositories and data +- But `app.ini` recovery no longer depends on borgmatic (can be regenerated via ansible) + +### Wed Jan 22 2026 + +- Removed `/opt/homebrew/var/loki` from backup sources (stale data from pre-k8s migration) +- Loki now runs in k8s with ephemeral storage - logs are not backed up by design +- Verified backup integrity after cleanup + +### Mon Jan 20 2026 (P5) + +- Removed `~/devpi` from backup sources (devpi migrated to k8s) +- devpi data now in k8s PVC - backup strategy TBD + +### Sun Jan 19 2026 (P4) + +- Removed localhost PostgreSQL backup (brew pg retired) +- Updated to backup only `pg.tail8d86e.ts.net` (k8s CloudNativePG) +- Moved .pgpass management from postgresql role to borgmatic role + +### Sun Jan 19 2026 (P3) + +- Fixed borgmatic failing to find `borg` binary by adding `local_path` option to config +- Added k8s-pg (CloudNativePG cluster) backup alongside brew PostgreSQL +- Added ACL grant for `tag:homelab` → `tag:k8s` on port 5432 for backup access +- Successfully tested disaster recovery: restored miniflux data from borgmatic dump to k8s-pg +- Created `borgmatic` user in k8s-pg via CloudNativePG managed roles +- Both localhost and k8s-pg databases backed up during migration period + +### Sat Jan 18 2026 + +- Fixed borgmatic-metrics script failing in LaunchAgent context by using absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`) instead of `mise x -- borg` +- This was causing the Grafana dashboard to show "Repository Status: DOWN" and missing time series data + +### Fri Jan 17 2026 + +- Fixed PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found") +- Removed `~/code/3rd/kiwix-tools` from backups (was just symlinks, ZIM archives are re-downloadable) +- Enabled Loki log backup (removed from exclude_patterns) +- Added borgmatic_metrics role for Prometheus metrics collection +- Added Grafana dashboard for backup monitoring (size trends, dedup ratio, time since last backup) + +### Thu Jan 16 2026 + +- Moved config from manual management to ansible-managed template +- Added `postgresql_databases` backup for miniflux database +- Config now deployed via `ansible/roles/borgmatic/templates/config.yaml.j2` diff --git a/docs/external-secrets.md b/docs/external-secrets.md new file mode 100644 index 0000000..64c6a60 --- /dev/null +++ b/docs/external-secrets.md @@ -0,0 +1,75 @@ +--- +id: external-secrets +aliases: + - external-secrets + - eso + - external-secrets-operator +tags: + - blumeops +--- + +# External Secrets Operator + +External Secrets Operator (ESO) syncs secrets from 1Password to Kubernetes Secrets via 1Password Connect. + +## Architecture + +``` +1Password Cloud + | + v +1Password Connect (namespace: 1password) + | + v +External Secrets Operator (namespace: external-secrets) + | + v +Native Kubernetes Secrets +``` + +## Usage + +ClusterSecretStore `onepassword-blumeops` provides access to the blumeops vault. See `argocd/manifests/devpi/external-secret.yaml` for a simple example. + +**Important:** 1Password Connect doesn't support the `?ssh-format=openssh` query parameter. SSH keys must be stored as Secure Notes with the OpenSSH-formatted key (see `argocd-forge-ssh-key` item). + +```bash +# Check all ExternalSecrets +kubectl --context=minikube-indri get externalsecret -A + +# Find 1Password field names +op item get --vault blumeops --format json | jq '.fields[] | .label' +``` + +## Bootstrap (One-Time Setup) + +If reinstalling from scratch: + +1. Create Connect server credentials: + ```bash + op connect server create blumeops --vaults blumeops + op connect token create blumeops --server --vault blumeops + ``` + +2. Store in 1Password item "1Password Connect": + - `credentials-file`: raw JSON + - `credentials-base64`: base64-encoded JSON + - `token`: access token + +3. Apply bootstrap secret: + ```bash + kubectl --context=minikube-indri create namespace 1password + op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | \ + kubectl --context=minikube-indri apply -f - + ``` + +4. Sync apps in order: + - `argocd app sync 1password-connect` + - `argocd app sync external-secrets-crds` + - `argocd app sync external-secrets` + - `argocd app sync external-secrets-config` + +## Related + +- [[1767747119-YCPO|BlumeOps]] +- [[argocd|ArgoCD]] diff --git a/docs/grafana.md b/docs/grafana.md new file mode 100644 index 0000000..a086b2b --- /dev/null +++ b/docs/grafana.md @@ -0,0 +1,58 @@ +--- +id: grafana +aliases: + - grafana +tags: + - blumeops +--- + +# Grafana Management Log + +Grafana provides dashboards and observability for [[blumeops]]. + +## Service Details + +- URL: https://grafana.ops.eblu.me (also https://grafana.tail8d86e.ts.net) +- Namespace: `monitoring` +- Helm chart: grafana (mirrored to forge) +- Values: `argocd/manifests/grafana/values.yaml` +- Dashboards: `argocd/manifests/grafana-config/dashboards/` + +## Embedding Note + +Grafana panel embedding via iframes was attempted for Homepage but didn't work well: +- Homepage's iframe widget doesn't support width constraints (only height) +- Grafana's "Public Dashboards" feature doesn't support template variables or PostgreSQL datasources +- Anonymous auth would be required, which exposes all dashboards + +Current config has `allow_embedding: false`. If revisiting this, see git history for the iframe attempt (2026-01-30). + +## Datasources + +| Name | Type | URL | +|------|------|-----| +| Prometheus | prometheus | `http://prometheus.monitoring.svc.cluster.local:9090` | +| Loki | loki | `http://loki.monitoring.svc.cluster.local:3100` | +| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` | + +## Dashboard Provisioning + +Dashboards are provisioned via ConfigMaps with label `grafana_dashboard: "1"`. The sidecar watches for these and loads them automatically. + +To add a dashboard: +1. Create ConfigMap in `argocd/manifests/grafana-config/dashboards/` +2. Add label `grafana_dashboard: "1"` +3. Optionally add annotation `grafana_folder: "FolderName"` for organization +4. Sync the `grafana-config` ArgoCD app + +## Log + +### 2026-01-30 +- Attempted Grafana iframe embeds for Homepage metrics panel +- Issues: width constraints don't work, some panels fail to load +- Reverted to authenticated-only access (no anonymous auth) + +### 2026-01-19 (Phase 2) +- Migrated from Homebrew/Ansible to Kubernetes +- Helm chart mirrored to forge +- Exposed via Tailscale Ingress diff --git a/docs/indri.md b/docs/indri.md new file mode 100644 index 0000000..206f8aa --- /dev/null +++ b/docs/indri.md @@ -0,0 +1,65 @@ +--- +id: indri +aliases: + - indri + - mac-mini +tags: + - blumeops +--- + +# Indri Maintenance Log + +Indri is a Mac Mini M1 (2020) serving as the primary [[1767747119-YCPO|BlumeOps]] server. + +## Host Details + +- Model: Mac mini M1, 2020 (Macmini9,1) +- Storage: 2TB internal SSD +- macOS: 15.7.3 (Sequoia) +- Role: Primary server for homelab services + +## Passwordless Sudo + +Configured passwordless sudo for `erichblume` user to allow ansible `become: true` tasks to run without password prompts: + +```bash +# Config at /etc/sudoers.d/erichblume +erichblume ALL=(ALL) NOPASSWD: ALL +``` + +This is acceptable given the security model - tailnet access is the trust boundary. + +## Sleep Prevention + +Indri must stay awake to serve network requests. Currently using **Amphetamine** (App Store) to prevent sleep. + +**Configuration:** +- Start Session At Launch: enabled +- Default Duration: indefinite +- Allow Closed-Display Sleep: enabled (no display attached) + +**Known Issue:** Amphetamine can crash after extended uptime (~12 days observed), leaving the system unprotected. If this becomes a recurring problem, consider switching to system-level sleep prevention: + +```bash +# Option 1: Disable sleep via pmset (requires sudo) +sudo pmset -c sleep 0 displaysleep 0 + +# Option 2: Use caffeinate daemon via LaunchAgent +# Create ~/Library/LaunchAgents/com.local.caffeinate.plist +caffeinate -s # -s = prevent sleep on AC power +``` + +These could be managed via ansible for reliability. + +## Log + +### Mon Jan 20 2026 + +**Amphetamine crash caused overnight sleep** + +- Amphetamine 5.3.2 crashed at 19:08 on Jan 19 (segfault in `objc_release` during timer callback) +- System went to sleep at 19:20, stayed asleep overnight +- Discovered when services were unreachable; manually restarted Amphetamine at ~07:30 +- Crash report: `~/Library/Logs/DiagnosticReports/Amphetamine-2026-01-19-190921.ips` +- Root cause: Memory management bug in Amphetamine during long-running session (~12 days uptime) +- Action: Monitoring for now; if recurs, will implement `pmset`/`caffeinate` via ansible diff --git a/docs/jellyfin.md b/docs/jellyfin.md new file mode 100644 index 0000000..a0e6129 --- /dev/null +++ b/docs/jellyfin.md @@ -0,0 +1,90 @@ +--- +id: jellyfin +aliases: + - jellyfin +tags: + - blumeops +--- + +# Jellyfin Management Log + +Jellyfin is a free, open-source media server running natively on [[indri|Indri]] for full VideoToolbox hardware transcoding support. + +## Service Details + +- URL: https://jellyfin.ops.eblu.me +- Port: 8096 (localhost only, proxied via Caddy) +- Data directory: `~/Library/Application Support/jellyfin` +- Media path: `/Volumes/allisonflix` (NFS from sifaka) +- LaunchAgent: `mcquack.jellyfin` + +## Useful Commands + +```bash +# Check LaunchAgent status +ssh indri 'launchctl list | grep jellyfin' + +# View logs +ssh indri 'tail -f ~/Library/Logs/mcquack.jellyfin.err.log' + +# Check port is listening +ssh indri 'lsof -nP -iTCP:8096 -sTCP:LISTEN' + +# Restart Jellyfin +ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist && launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist' + +# Check metrics +ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/jellyfin.prom' +``` + +## Hardware Transcoding + +Jellyfin uses Apple VideoToolbox for hardware-accelerated transcoding on the M1 Mac Mini. + +**Capabilities:** +- H.264 encode/decode: Hardware +- HEVC (H.265) encode/decode: Hardware +- AV1 decode: Software only (requires M3+) +- HDR to SDR tone mapping: VPP (hardware) +- Concurrent 4K streams: ~3 with HDR tonemapping + +**Configuration** (Dashboard > Playback): +1. Hardware Acceleration: Apple VideoToolbox +2. Allow hardware encoding: Enabled +3. VPP Tone mapping: Enabled (for HDR to SDR) + +## Observability + +- Metrics: Collected via `jellyfin_metrics` ansible role to Prometheus textfile +- Logs: Forwarded to Loki via Alloy (`service="jellyfin"`) +- Dashboard: "Jellyfin Media Server" in Grafana + +### Metrics collected: +- `jellyfin_up` - Server availability +- `jellyfin_version_info` - Server version +- `jellyfin_library_items{library,type}` - Library counts +- `jellyfin_sessions_total` - Active sessions +- `jellyfin_sessions_playing` - Playing sessions +- `jellyfin_transcode_sessions_total` - Transcoding sessions + +## API Key Setup + +Metrics collection requires an API key: + +1. Open https://jellyfin.ops.eblu.me +2. Go to Dashboard > API Keys > Add +3. Create key with description "metrics" +4. Save to indri: +```bash +ssh indri 'echo "YOUR_API_KEY" > ~/.jellyfin-api-key && chmod 600 ~/.jellyfin-api-key' +``` + +## Log + +### 2026-01-30 (Initial Deployment) +- Deployed Jellyfin natively on indri via Ansible +- Installed via Homebrew cask, managed via LaunchAgent +- Added Caddy routing for `jellyfin.ops.eblu.me` +- Added metrics collection (jellyfin_metrics role) +- Added log collection via Alloy +- Created Grafana dashboard diff --git a/docs/kiwix.md b/docs/kiwix.md new file mode 100644 index 0000000..20021c0 --- /dev/null +++ b/docs/kiwix.md @@ -0,0 +1,103 @@ +--- +id: kiwix +aliases: + - kiwix +tags: + - blumeops +--- + +# Kiwix Management Log + +Kiwix serves offline Wikipedia (and other ZIM archives) in Kubernetes via Tailscale at https://kiwix.tail8d86e.ts.net. + +## Service Details + +- URL: https://kiwix.tail8d86e.ts.net +- Namespace: `kiwix` +- Image: `ghcr.io/kiwix/kiwix-serve:3.8.1` +- ArgoCD app: `kiwix` +- Storage: NFS mount from sifaka (`/volume1/torrents`) + +## Architecture + +The kiwix deployment has two components: + +1. **kiwix-serve** - Main container serving ZIM files at port 80 +2. **torrent-sync** - Sidecar that syncs declarative ZIM torrent list to Transmission + +A CronJob (`zim-watcher`) runs hourly to detect new ZIM files and trigger a deployment restart when needed. + +## Useful Commands + +```bash +# View kiwix logs +kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c kiwix-serve + +# View torrent sync logs +kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c torrent-sync + +# Check ZIM watcher job +kubectl --context=minikube-indri -n kiwix get cronjob zim-watcher + +# Manually trigger ZIM watcher +kubectl --context=minikube-indri -n kiwix create job --from=cronjob/zim-watcher zim-watcher-manual + +# Sync from ArgoCD +argocd app sync kiwix +``` + +## ArgoCD Management + +Kiwix is deployed via ArgoCD from `argocd/manifests/kiwix/`: +- `deployment.yaml` - Kiwix-serve + torrent-sync sidecar +- `service.yaml` - ClusterIP service +- `ingress-tailscale.yaml` - Tailscale Ingress +- `configmap-zim-torrents.yaml` - Declarative list of ZIM torrents to download +- `configmap-sync-script.yaml` - Script to sync torrents to Transmission +- `cronjob-zim-watcher.yaml` - Hourly job to restart kiwix on new ZIMs + +## Adding New ZIM Archives + +1. Edit `argocd/manifests/kiwix/configmap-zim-torrents.yaml` +2. Add the torrent URL from https://download.kiwix.org/zim/ +3. Sync the kiwix app: `argocd app sync kiwix` +4. The torrent-sync sidecar will add the torrent to [[transmission|Transmission]] +5. Once downloaded, the zim-watcher CronJob will detect it and restart kiwix + +## Configured Archives + +The declarative torrent list includes: +- Wikipedia top 1M English articles with images +- Project Gutenberg (60,000+ public domain books) +- iFixit repair guides +- Stack Exchange sites (SuperUser, Math, etc.) +- LibreTexts textbooks (Bio, Chem, Eng, Math, Phys, Humanities) +- DevDocs (developer documentation bundles) + +See `argocd/manifests/kiwix/configmap-zim-torrents.yaml` for the full list. + +## Storage + +ZIM files are stored on sifaka NAS at `/volume1/torrents/complete/`. The kiwix pod mounts this directory via NFS. + +**Note**: The NFS mount works because minikube uses the docker driver which NATs through indri's LAN IP, allowing direct access to sifaka. + +## Log + +### 2026-01-21 (P6) +- **Migrated to Kubernetes** (Phase 6 of k8s migration) +- Direct NFS mount from sifaka (no PVC, shared with transmission) +- Torrent-sync sidecar adds configured ZIMs to Transmission +- ZIM-watcher CronJob restarts deployment when new files appear +- Tailscale Ingress at `kiwix.tail8d86e.ts.net` +- Retired ansible kiwix role from indri + +### 2026-01-14 +- Added transmission integration for background torrent downloads +- Enabled Gutenberg, iFixit, SuperUser, Math SE, and all LibreTexts archives + +### 2026-01-13 +- Added kiwix role to ansible playbook +- Operationalized ZIM archive downloads with configurable list +- Initial setup with kiwix-tools binary on indri +- Managed via LaunchAgent on port 5501 diff --git a/docs/miniflux.md b/docs/miniflux.md new file mode 100644 index 0000000..d0969cf --- /dev/null +++ b/docs/miniflux.md @@ -0,0 +1,83 @@ +--- +id: miniflux +aliases: + - miniflux + - feed + - rss +tags: + - blumeops +--- + +# Miniflux Management Log + +Miniflux is a minimalist RSS/Atom feed reader running in Kubernetes (minikube on indri). + +## Service Details + +- URL: https://feed.tail8d86e.ts.net +- Namespace: miniflux +- Image: ghcr.io/miniflux/miniflux:latest +- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net) +- ArgoCD app: miniflux + +## Useful Commands + +```bash +# View logs +kubectl -n miniflux logs -f deployment/miniflux + +# Restart deployment +kubectl -n miniflux rollout restart deployment/miniflux + +# Check health +curl https://feed.tail8d86e.ts.net/healthcheck + +# Sync from ArgoCD +argocd app sync miniflux +``` + +## ArgoCD Management + +Miniflux is deployed via ArgoCD from `argocd/manifests/miniflux/`: +- `deployment.yaml` - Deployment with environment configuration +- `service.yaml` - ClusterIP service +- `ingress-tailscale.yaml` - Tailscale Ingress for external access + +## Credentials + +The miniflux database user password is auto-generated by CloudNativePG and stored in the `blumeops-pg-app` secret in the databases namespace. + +To recreate the miniflux-db secret: +```bash +kubectl create secret generic miniflux-db -n miniflux \ + --from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)" +``` + +## Features + +- Keyboard shortcuts for efficient reading +- Fever and Google Reader API compatible +- Mobile-friendly web interface +- OPML import/export +- Content scraping for full articles + +## Backup + +Feed subscriptions and read state stored in [[postgresql]], backed up via borgmatic's postgresql_databases hook. + +## Log + +### Sun Jan 19 2026 + +- **Migrated to Kubernetes** (Phase 4 of k8s migration) +- Deployed via ArgoCD in `miniflux` namespace +- Database connection via internal k8s DNS to CloudNativePG cluster +- Exposed via Tailscale Ingress at feed.tail8d86e.ts.net +- Removed brew miniflux service and ansible role from indri +- Fixed table ownership issue after P3 restore (tables were owned by eblume, needed to be owned by miniflux) + +### Thu Jan 16 2026 + +- Initial setup with Miniflux 2.x on brew +- Connected to PostgreSQL 18 on localhost +- Exposed via Tailscale at feed.tail8d86e.ts.net diff --git a/docs/minikube.md b/docs/minikube.md new file mode 100644 index 0000000..dbf6aa3 --- /dev/null +++ b/docs/minikube.md @@ -0,0 +1,137 @@ +--- +id: minikube +aliases: + - minikube + - kubernetes + - k8s +tags: + - blumeops +--- + +# Minikube Management Log + +Minikube provides a single-node Kubernetes cluster on Indri for running containerized services. + +## Cluster Details + +- Driver: **docker** (runs as container inside Docker Desktop) +- Container runtime: docker +- Kubernetes version: v1.34.0 +- Resources: 6 CPUs, 11GB RAM (leaves 1GB for Docker Desktop overhead), 200GB disk +- API server: https://k8s.tail8d86e.ts.net (Tailscale service with TCP passthrough) +- Internal port: dynamic (currently 50820 - Docker maps random host port to container's 6443) + +**Prerequisites:** Docker Desktop must be installed and running with at least 12GB memory allocated. + +## Remote Access from Gilbert + +Run `mise run ensure-minikube-indri-kubectl-config` to set up kubectl access. This script: +1. Fetches certificates from indri via SSH +2. Creates kubeconfig at `~/.kube/minikube-indri/config.yml` + +**Fish abbreviations** (in `~/.config/fish/config.fish`): +- `ki` -> `kubectl --context=minikube-indri` +- `k9i` -> `k9s --context=minikube-indri` +- `k9` -> `k9s` + +```bash +# Quick access via abbreviations +ki get nodes +k9i + +# Or explicitly set context +kubectl config use-context minikube-indri +kubectl get nodes +``` + +## Volume Mounting (for P6 kiwix/transmission) + +**Direct NFS from pods to sifaka** - tested and working. + +Docker NATs outbound traffic through indri's LAN IP (192.168.1.50). Sifaka's NFS exports allow: +- `192.168.1.0/24` - Docker containers via indri NAT +- `100.64.0.0/10` - Tailscale clients + +Pods mount NFS directly: +```yaml +volumes: + - name: torrents + nfs: + server: sifaka + path: /volume1/torrents +``` + +No LaunchAgents, no `minikube mount`, no hostPath complexity needed. + +## Useful Commands (on indri) + +```bash +# Cluster status +minikube status + +# Start/stop cluster +minikube start +minikube stop + +# Access dashboard +minikube dashboard + +# SSH into node +minikube ssh + +# View logs +minikube logs + +# Get API server URL (shows current port) +kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" +``` + +## Registry Mirror (Zot) + +Containerd is configured to use [[zot]] on indri as a pull-through cache for container images. This is managed by the ansible `minikube` role. + +Config location: `/etc/containerd/certs.d//hosts.toml` (inside minikube container) + +With docker driver, uses `host.minikube.internal:5050` to reach zot on the host. + +Mirrors configured for: +- `registry.ops.eblu.me` (private images) +- `docker.io` +- `ghcr.io` +- `quay.io` + +To verify the mirror is working: +```bash +# Check zot's cached images +curl -s http://localhost:5050/v2/_catalog | jq +``` + +## Log + +### 2026-01-21 (Docker Driver Migration) +- **Migrated from qemu2 to docker driver** (Phase 5.1) +- qemu2 had Tailscale TCP proxy issue (TLS handshake timeout to VM IP) +- docker driver puts API server on localhost, which Tailscale serve handles correctly +- Removed socket_vmnet, qemu dependencies +- Removed NFS/minikube-mount LaunchAgents (will re-add NFS for P6 with simpler hostPath approach) +- API server port is now dynamic (Docker assigns random host port) +- Ansible role updated to query port and configure tailscale serve accordingly +- Created `mise run ensure-minikube-indri-kubectl-config` for workstation setup + +### 2026-01-21 (QEMU2 Migration - superseded) +- Migrated from podman to qemu2 driver +- Podman driver had fundamental limitations preventing volume mounts +- qemu2 created actual VM with full kernel capabilities +- Volume mounting solution: NFS on host + minikube mount passthrough +- **Issue discovered:** Tailscale TCP proxy to VM IP (192.168.105.2:6443) fails with TLS timeout + +### 2026-01-19 +- Configured CRI-O registry mirror to use zot as pull-through cache +- Added ansible automation to apply mirror config on provisioning +- Fixed ansible hanging: `minikube ssh` with piped stdin requires `--native-ssh=false` + +### 2026-01-18 +- Initial cluster setup for k8s migration Phase 0 +- Configured for remote access with --apiserver-names=indri +- 1Password credential integration for kubectl from gilbert +- Exposed as Tailscale service `k8s.tail8d86e.ts.net` with TCP passthrough diff --git a/docs/navidrome.md b/docs/navidrome.md new file mode 100644 index 0000000..ef66d5b --- /dev/null +++ b/docs/navidrome.md @@ -0,0 +1,80 @@ +--- +id: navidrome +aliases: + - DJ +tags: + - blumeops + - service +--- + +Navidrome is a self-hosted music streaming server deployed on [[blumeops|BlumeOps]]. + +# Access + +- **Primary URL**: https://dj.ops.eblu.me (via Caddy) +- **Tailscale URL**: https://dj.tail8d86e.ts.net + +# Deployment + +Navidrome runs in Kubernetes (minikube on [[indri]]) and is managed via [[argocd|ArgoCD]]. + +**Manifests**: `argocd/manifests/navidrome/` + +## Storage + +| Mount | Type | Source | Access | +|---------|-------------------|-------------------------|------------| +| /music | NFS PV | sifaka:/volume1/music | Read-only | +| /data | Local PVC (10Gi) | minikube storage class | Read-write | + +The `/data` directory contains: +- SQLite database +- Configuration +- Cache files + +## Configuration + +Environment variables set in deployment: +- `ND_SCANSCHEDULE=1h` - Rescan library every hour +- `ND_LOGLEVEL=info` - Standard logging level +- `ND_MUSICFOLDER=/music` - Music library path +- `ND_DATAFOLDER=/data` - Data directory path + +## Initial Setup + +On first access, Navidrome will prompt to create an admin user. No default credentials. + +# Operations + +## Sync Application + +```bash +argocd app sync navidrome +``` + +## Check Status + +```bash +argocd app get navidrome +kubectl --context=minikube-indri -n navidrome get pods +kubectl --context=minikube-indri -n navidrome logs deploy/navidrome +``` + +## Verify NFS Mount + +```bash +kubectl --context=minikube-indri -n navidrome exec deploy/navidrome -- ls /music +``` + +## Force Library Rescan + +Access Settings > Library in the web UI, or trigger via API: +```bash +curl -X POST https://dj.ops.eblu.me/api/library/scan -H "x-nd-authorization: Bearer " +``` + +# Related + +- [[jellyfin]] - Video streaming (runs on indri directly) +- [[argocd]] - GitOps deployment +- [[blumeops]] - Infrastructure overview diff --git a/docs/postgresql.md b/docs/postgresql.md new file mode 100644 index 0000000..fd2014c --- /dev/null +++ b/docs/postgresql.md @@ -0,0 +1,131 @@ +--- +id: postgresql +aliases: + - postgresql + - postgres + - pg +tags: + - blumeops +--- + +# PostgreSQL Management Log + +PostgreSQL database cluster running in Kubernetes (minikube on indri) via CloudNativePG operator, providing storage for [[miniflux]] and other services. + +## Quick Connect + +```bash +# Connect as superuser (fetches password from 1Password) +PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -d miniflux +``` + +## Service Details + +- URL: tcp://pg.tail8d86e.ts.net:5432 +- Metrics: http://cnpg-metrics.tail8d86e.ts.net:9187/metrics +- Namespace: databases +- Cluster name: blumeops-pg +- Operator: CloudNativePG +- ArgoCD app: blumeops-pg + +## Databases + +| Database | Owner | Purpose | +|----------|----------|----------------------------| +| miniflux | miniflux | Miniflux feed reader data | + +## Users + +| User | Role | Purpose | +|-----------|------------------|------------------------| +| postgres | superuser | CNPG internal | +| miniflux | app owner | Owns miniflux database | +| eblume | superuser | Admin access | +| borgmatic | pg_read_all_data | Backup access | + +## Useful Commands + +```bash +# List databases +PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\l" + +# List users +PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\du" + +# View CNPG cluster status +kubectl -n databases get cluster blumeops-pg + +# View pod logs +kubectl -n databases logs -f blumeops-pg-1 +``` + +## Backup + +PostgreSQL data is backed up via borgmatic from indri using the `postgresql_databases` hook, which streams pg_dump directly to Borg for consistent backups. + +Borgmatic config (`~/.config/borgmatic/config.yaml`): +```yaml +postgresql_databases: + - name: miniflux + hostname: pg.tail8d86e.ts.net + port: 5432 + username: borgmatic +``` + +Password is read from `~/.pgpass` (managed by borgmatic ansible role). + +## ArgoCD Management + +```bash +# Sync cluster changes +argocd app sync blumeops-pg + +# Force reconcile +kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite +``` + +**Files:** +- Cluster spec: `argocd/manifests/databases/blumeops-pg.yaml` +- Tailscale service: `argocd/manifests/databases/service-tailscale.yaml` +- Secrets: `secret-eblume.yaml.tpl`, `secret-borgmatic.yaml.tpl` (via `op inject`) + +## Credentials + +**1Password items:** +- `guxu3j7ajhjyey6xxl2ovsl2ui` - eblume superuser password +- `mw2bv5we7woicjza7hc6s44yvy` - borgmatic user password + +**CNPG-managed secrets:** +- `blumeops-pg-app` - miniflux user (auto-generated password) +- `blumeops-pg-eblume` - eblume superuser +- `blumeops-pg-borgmatic` - borgmatic backup user + +## Log + +### Wed Jan 22 2026 + +- Added CNPG metrics collection via Tailscale service at `cnpg-metrics.tail8d86e.ts.net:9187` +- Updated PostgreSQL Grafana dashboard to use CNPG metric names (`cnpg_*` prefix) +- Prometheus on indri now scrapes CNPG metrics directly + +### Sun Jan 19 2026 (P4) + +- **Retired brew PostgreSQL** - k8s CloudNativePG is now the only PostgreSQL +- Renamed Tailscale hostname from `k8s-pg` to `pg` (canonical) +- Removed postgresql ansible role from indri +- Moved .pgpass management to borgmatic role +- Updated borgmatic to backup only `pg.tail8d86e.ts.net` +- Fixed table ownership issue: P3 restore created tables owned by eblume, transferred to miniflux + +### Sun Jan 19 2026 (P3) + +- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg +- Added borgmatic user to k8s-pg via CloudNativePG managed roles +- Both brew and k8s PostgreSQL backed up by borgmatic during migration +- Added Tailscale ACL: `tag:homelab` → `tag:k8s` on port 5432 for backup access + +### Thu Jan 16 2026 + +- Initial setup with PostgreSQL 18 (brew) +- Created miniflux database and user +- Exposed via Tailscale at pg.tail8d86e.ts.net diff --git a/docs/pulumi.md b/docs/pulumi.md new file mode 100644 index 0000000..dbe071c --- /dev/null +++ b/docs/pulumi.md @@ -0,0 +1,73 @@ +--- +id: pulumi +aliases: + - pulumi + - tailnet-iac +tags: + - blumeops +--- + +# Pulumi Tailnet IaC Management Log + +Pulumi manages the tail8d86e.ts.net tailnet configuration, including ACLs, tags, and DNS settings. + +## Architecture + +Two-layer approach: +- **Layer 1 (Pulumi)**: Tailnet-wide config - ACLs, tags, DNS (this card) +- **Layer 2 (Ansible)**: Node-local `tailscale serve` config - see `tailscale_serve` role + +## Service Details + +- State backend: Pulumi Cloud (https://app.pulumi.com/eblume/blumeops-tailnet) +- Stack: `tail8d86e` +- Config directory: `pulumi/` in blumeops repo +- Policy file: `pulumi/policy.hujson` (HuJSON with comments) + +## Authentication + +Uses OAuth client stored in 1Password (blumeops vault): +- Client configured with scopes: acl, dns, devices, services +- Auto-applies `tag:blumeops` to IaC-managed resources + +## Useful Commands + +```bash +# Preview changes +mise run tailnet-preview + +# Apply changes +mise run tailnet-up + +# View current state +mise run tailnet-preview + +# Pass additional args +mise run tailnet-up -- --yes +``` + +## Making ACL Changes + +1. Edit `pulumi/policy.hujson` in the blumeops repo +2. Run `mise run tailnet-preview` to see what will change +3. Run `mise run tailnet-up` to apply +4. Commit and push + +## What's Managed + +Currently managed by Pulumi: +- ACL policy (`tailscale:index:Acl`) + +Can be added later: +- DNS nameservers (`tailscale:index:DnsNameservers`) +- DNS search paths (`tailscale:index:DnsSearchPaths`) +- Tailnet settings (`tailscale:index:TailnetSettings`) + +## Log + +### Wed Jan 15 2026 + +- Initial setup with Pulumi + Python +- Imported existing ACL from Tailscale +- State stored in Pulumi Cloud (free tier) +- OAuth authentication via 1Password diff --git a/docs/teslamate.md b/docs/teslamate.md new file mode 100644 index 0000000..195cb2b --- /dev/null +++ b/docs/teslamate.md @@ -0,0 +1,113 @@ +--- +id: teslamate +aliases: + - teslamate + - tesla +tags: + - blumeops +--- + +# TeslaMate + +TeslaMate is a self-hosted Tesla data logger running in Kubernetes (minikube on indri), collecting and visualizing vehicle data from the Tesla Owner API. + +## Service Details + +- URL: https://tesla.tail8d86e.ts.net +- Namespace: `teslamate` +- Image: `teslamate/teslamate:2.2.0` +- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net) +- ArgoCD app: `teslamate` + +## What TeslaMate Collects + +- Battery level, state of charge, range estimates +- Charging sessions (location, energy, cost, duration) +- Drives (distance, efficiency, routes) +- Climate/HVAC usage +- Software update history +- Vampire drain analysis +- Vehicle states (asleep, driving, charging, online) + +## Grafana Dashboards + +18 dashboards available in Grafana under the "TeslaMate" folder at https://grafana.tail8d86e.ts.net: + +- Overview, Charges, Drives, Efficiency, States +- Battery Health, Vampire Drain, Statistics +- Charge Level, Locations, Trip, Mileage +- Drive Stats, Charging Stats, Projected Range +- Timeline, Updates, Visited + +Dashboards use the `TeslaMate` PostgreSQL datasource (not Prometheus). + +## Useful Commands + +```bash +# View logs +kubectl --context=minikube-indri -n teslamate logs -f deployment/teslamate + +# Check pod status +kubectl --context=minikube-indri -n teslamate get pods + +# Restart deployment +kubectl --context=minikube-indri -n teslamate rollout restart deployment/teslamate + +# Sync from ArgoCD +argocd app sync teslamate +``` + +## Credentials + +**1Password items (blumeops vault):** +- `TeslaMate` - contains `db_password` and `api_enc_key` fields + +**Kubernetes secrets:** +- `teslamate-db` (teslamate ns) - DATABASE_PASS for PostgreSQL connection +- `teslamate-encryption` (teslamate ns) - ENCRYPTION_KEY for token encryption +- `blumeops-pg-teslamate` (databases ns) - CloudNativePG managed role password +- `grafana-teslamate-datasource` (monitoring ns) - Grafana datasource password + +## Backup + +TeslaMate data is backed up via [[borgmatic]]: +- PostgreSQL database `teslamate` included in `borgmatic_postgresql_databases` +- Backed up alongside miniflux to sifaka NAS + +## Tesla API Authentication + +TeslaMate uses Tesla's Owner API (not Fleet API) via OAuth: + +1. Access https://tesla.tail8d86e.ts.net +2. Click "Sign in with Tesla" +3. Complete OAuth flow in browser +4. Tokens are encrypted with ENCRYPTION_KEY and stored in database +5. TeslaMate automatically refreshes tokens as needed + +**Standalone OAuth tool:** If you need to manually obtain tokens, there's a Rust-based helper: +- Mirror: https://forge.tail8d86e.ts.net/eblume/tesla_auth.git +- Runs OAuth flow and outputs access/refresh tokens + +## Database Notes + +- TeslaMate requires PostgreSQL 17.3+ or 18.x +- The `teslamate` user has superuser privileges (required for extension management during migrations) +- Extensions used: `cube`, `earthdistance` (for geospatial calculations) + +## Related + +- [[1767747119-YCPO|BlumeOps]] +- [[argocd|ArgoCD]] +- [[postgresql|PostgreSQL]] +- [[borgmatic|Borgmatic]] + +## Log + +### Thu Jan 23 2026 + +- Initial deployment to Kubernetes +- 18 Grafana dashboards imported from TeslaMate project +- Upgraded CloudNativePG 1.25 -> 1.28 for major version upgrade support +- Upgraded PostgreSQL 17.2 -> 18.1 (required for TeslaMate 2.2.0) +- Tailscale Ingress at `tesla.tail8d86e.ts.net` +- Backup configuration added to borgmatic diff --git a/docs/transmission.md b/docs/transmission.md new file mode 100644 index 0000000..5d8e6fd --- /dev/null +++ b/docs/transmission.md @@ -0,0 +1,100 @@ +--- +id: transmission +aliases: + - transmission +tags: + - blumeops +--- + +# Transmission Management Log + +Transmission is a BitTorrent daemon running in Kubernetes, primarily used to download large ZIM archives for [[kiwix|Kiwix]]. + +## Service Details + +- URL: https://torrent.tail8d86e.ts.net +- Namespace: `torrent` +- Image: `lscr.io/linuxserver/transmission:latest` +- ArgoCD app: `torrent` +- Storage: NFS PVC from sifaka (`/volume1/torrents`) + +## Useful Commands + +```bash +# View transmission logs +kubectl --context=minikube-indri -n torrent logs -f deployment/transmission + +# Check RPC connectivity (from another pod) +kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \ + curl -s http://transmission.torrent.svc.cluster.local:9091/transmission/rpc + +# Sync from ArgoCD +argocd app sync torrent +``` + +## ArgoCD Management + +Transmission is deployed via ArgoCD from `argocd/manifests/torrent/`: +- `deployment.yaml` - Transmission container with NFS volume +- `service.yaml` - ClusterIP service (port 9091) +- `ingress-tailscale.yaml` - Tailscale Ingress for web UI +- `pv-nfs.yaml` - NFS PersistentVolume +- `pvc.yaml` - PersistentVolumeClaim + +## Storage Layout + +The NFS share on sifaka (`/volume1/torrents`) has this structure: +- `/downloads/` - Active downloads and torrent metadata +- `/downloads/complete/` - Completed downloads +- `/config/` - Transmission configuration +- `/watch/` - Watch directory for .torrent files + +Kiwix reads from `/downloads/complete/` to serve ZIM archives. + +## Integration with Kiwix + +The [[kiwix]] deployment includes a torrent-sync sidecar that: +1. Reads the declarative ZIM torrent list from a ConfigMap +2. Adds missing torrents to Transmission via RPC +3. Runs on startup and every 30 minutes + +When downloads complete: +1. Transmission moves files to `/downloads/complete/` +2. The zim-watcher CronJob (in kiwix namespace) detects new ZIMs +3. Kiwix deployment is restarted to pick up new archives + +## Monitoring + +**TODO:** Write custom transmission exporter. Existing exporters (`metalmatze/transmission-exporter`, `sandrotosi/simple_transmission_exporter`) are incompatible with Transmission 4's changed JSON API (type mismatches in `lastScrapeTimedOut` field). + +Current monitoring via web UI at https://torrent.tail8d86e.ts.net: +- Active/seeding/paused torrent counts +- Upload/download speeds +- Disk usage + +Basic uptime monitoring via blackbox probe in [[alloy|Alloy k8s]] (see Services Health dashboard). + +## Log + +### 2026-01-22 + +- Attempted to add `metalmatze/transmission-exporter` sidecar for Prometheus metrics +- Exporter failed with JSON parsing errors - incompatible with Transmission 4 API changes +- Removed exporter sidecar, dashboard, and Prometheus scrape config +- Added basic HTTP probe via Alloy k8s blackbox exporter instead +- Deleted stale `transmission.prom` textfile from indri + +### 2026-01-21 (P6) +- **Migrated to Kubernetes** (Phase 6 of k8s migration) +- NFS PersistentVolume for storage on sifaka +- Tailscale Ingress at `torrent.tail8d86e.ts.net` +- RPC accessible to kiwix namespace for torrent sync +- Moved existing ZIM files to `/downloads/complete/` for seeding +- Retired ansible transmission role from indri + +### 2026-01-14 +- Added transmission role to ansible playbook +- Integrated with kiwix role for torrent-based ZIM downloads +- Initial setup with transmission-cli via homebrew +- Managed via brew services on port 9091 +- Metrics collected via textfile exporter diff --git a/docs/zot.md b/docs/zot.md new file mode 100644 index 0000000..578ec48 --- /dev/null +++ b/docs/zot.md @@ -0,0 +1,112 @@ +--- +id: zot +aliases: + - zot + - container-registry +tags: + - blumeops +--- + +# Zot Registry Management Log + +Zot is an OCI-native container registry running on Indri, providing: +1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits) +2. Private image storage for custom-built containers + +## Service Details + +- URL: https://registry.ops.eblu.me +- Local port: 5050 +- Data directory: ~/zot +- Config: ~/.config/zot/config.json +- Managed via: mcquack LaunchAgent + +## Namespace Convention + +| Path | Source | +|------|--------| +| `registry.../docker.io/*` | Cached from Docker Hub | +| `registry.../ghcr.io/*` | Cached from GHCR | +| `registry.../quay.io/*` | Cached from Quay | +| `registry.../blumeops/*` | Private images (yours) | + +## How It Works + +### Pull-Through Cache (Automatic) + +When [[minikube]] pulls an image like `docker.io/library/nginx:latest`: +1. Containerd checks zot first (via `host.minikube.internal:5050`) +2. If zot has it cached, returns immediately +3. If not, zot fetches from upstream, caches it, returns to k8s + +Cached images appear under their original registry path (e.g., `docker.io/library/nginx`). + +### Private Images (Manual Push) + +Build and push from gilbert using podman: +```bash +# Build +podman build -t registry.ops.eblu.me/blumeops/myapp:v1 . + +# Push to zot +podman push registry.ops.eblu.me/blumeops/myapp:v1 + +# Use in k8s manifest +image: registry.ops.eblu.me/blumeops/myapp:v1 +``` + +Private images go under `blumeops/*` namespace. Example: the devpi container is at `registry.ops.eblu.me/blumeops/devpi:latest`. + +### Security Model + +**Network access only** - no authentication configured. Anyone who can reach zot via Tailscale ACL can push/pull any image. Defense is the tailnet boundary. + +Zot supports htpasswd/LDAP/OIDC auth if needed in the future. + +## Minikube Integration + +The [[minikube]] cluster uses zot as a registry mirror via containerd configuration. Managed by the ansible `minikube` role. + +From inside minikube, zot is at `host.minikube.internal:5050`. Containerd tries the mirror first, falls back to upstream if not cached. + +Mirrors configured for: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io` + +## Useful Commands + +```bash +# List all cached/pushed images +curl -s http://indri:5050/v2/_catalog | jq + +# List tags for an image +curl -s http://indri:5050/v2/blumeops/devpi/tags/list | jq + +# Check service status +ssh indri 'launchctl list | grep zot' + +# View logs +ssh indri 'tail -f ~/Library/Logs/mcquack.zot.err.log' +``` + +## Log + +### 2026-01-25 +- **Migrated from Tailscale serve to Caddy** - now accessible at `registry.ops.eblu.me` +- Retired `tailscale_serve` ansible role (no longer needed) +- Updated minikube containerd config to use new URL +- Updated CI workflows and mise tasks +- Old URL (`registry.tail8d86e.ts.net`) deprecated + +### 2026-01-21 +- Verified full workflow: podman build on gilbert → push to zot → k8s pull +- Documented security model (network-only auth via Tailscale ACL) +- Updated minikube integration: now uses containerd (docker driver) instead of CRI-O (podman driver) +- Mirror endpoint changed from `host.containers.internal:5050` to `host.minikube.internal:5050` + +### 2026-01-19 +- Integrated with minikube as CRI-O registry mirror +- All k8s image pulls now go through zot cache automatically + +### 2026-01-18 +- Initial setup for k8s migration Phase 0 +- Configured pull-through cache for Docker Hub, GHCR, Quay +- Exposed via Tailscale service at registry.tail8d86e.ts.net diff --git a/mise-tasks/zk-docs b/mise-tasks/zk-docs index dbec30a..32de565 100755 --- a/mise-tasks/zk-docs +++ b/mise-tasks/zk-docs @@ -3,11 +3,12 @@ set -euo pipefail -ZK_DIR="$HOME/code/personal/zk" -MAIN_CARD="$ZK_DIR/1767747119-YCPO.md" +# Blumeops docs now live in the repo itself (symlinked into zk) +DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs" +MAIN_CARD="$DOCS_DIR/1767747119-YCPO.md" # Find all files tagged with blumeops (excluding main card) -other_cards=$(grep -l '^ - blumeops$' "$ZK_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort) +other_cards=$(grep -l '^ - blumeops$' "$DOCS_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort) # Concatenate: main card first, then others # Pass through any args to bat (e.g., --style=header --color=never --decorations=always)