diff --git a/CLAUDE.md b/CLAUDE.md index d15fb77..3e69152 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,7 +12,7 @@ blumeops is Erich Blume's GitOps repository for personal infrastructure manageme 1. **CRITICAL: Always use `--context=minikube-indri` with kubectl commands.** The user has work contexts configured that must never be touched. Every kubectl command must explicitly specify the context to prevent accidental operations against the wrong cluster. -2. At the start of every session, even if the user asked to do something else, run `mise run zk-docs -- --style=header --color=never --decorations=always` to review the blumeops documentation. The docs are hosted at https://docs.ops.eblu.me and source lives in `docs/`. The `docs/zk/` cards are legacy but still useful as reference. +2. At the start of every session, even if the user asked to do something else, run `mise run zk-docs -- --style=header --color=never --decorations=always` to prime your context with key BlumeOps documentation. The docs are hosted at https://docs.ops.eblu.me and source lives in `docs/`. 3. When making any changes, start by making sure you're on the `main` git branch and up-to-date, and then create a feature branch. Commit often while working, and create a PR using: ```fish @@ -62,7 +62,6 @@ Address each unresolved comment before proceeding. The user will resolve comment ``` ./docs/ # blumeops documentation (Diataxis structure, built with Quartz) ./docs/changelog.d/ # towncrier changelog fragments -./docs/zk/ # legacy zettelkasten cards (read-only reference) ./mise-tasks/ # management and utility scripts run via `mise run` ./ansible/playbooks/ # ansible playbooks (indri.yml is primary) ./ansible/roles/ # ansible roles for indri-hosted services diff --git a/README.md b/README.md index c41fc18..f9968fc 100644 --- a/README.md +++ b/README.md @@ -80,11 +80,6 @@ This repo uses [Forgejo Actions](https://forgejo.org/docs/latest/user/actions/) ## Documentation -Documentation lives in `docs/` and is being restructured to follow the [Diataxis](https://diataxis.fr/) framework. See [`docs/README.md`](docs/README.md) for the restructuring plan and current status. +Documentation lives in `docs/` and follows the [Diataxis](https://diataxis.fr/) framework. Published at https://docs.ops.eblu.me. -**Quick reference (zettelkasten cards):** -```bash -mise run zk-docs -``` - -The zk cards in `docs/zk/` use [Obsidian](https://obsidian.md) wiki-link syntax (`[[link]]`) for cross-references. Edit with any markdown editor, or use [obsidian.nvim](https://github.com/obsidian-nvim/obsidian.nvim) for enhanced navigation and completion. +Docs use [Obsidian](https://obsidian.md) wiki-link syntax (`[[link]]`) for cross-references. Edit with any markdown editor, or use [obsidian.nvim](https://github.com/obsidian-nvim/obsidian.nvim) for enhanced navigation. diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 81c9e5a..0000000 --- a/docs/README.md +++ /dev/null @@ -1,171 +0,0 @@ ---- -title: readme ---- - -# BlumeOps Documentation - -> **Note on naming**: The project is properly stylized as **BlumeOps**, though "blumeops" and "Blue Mops" are also commonly used interchangeably. - -This directory contains documentation for BlumeOps, Erich Blume's personal infrastructure GitOps repository. - -## Documentation Restructuring (In Progress) - -The documentation is being restructured to follow the [Diataxis](https://diataxis.fr/) documentation framework while serving multiple audiences. - -### Target Audiences - -1. **Erich (owner)** - A knowledge graph/zettelkasten for quickly recalling important facts about BlumeOps infrastructure and operations. - -2. **Claude/AI agents** - Memory and context enrichment for AI-assisted operations and development. - -3. **New external readers** - People who want to understand "what is BlumeOps?" at a high level. - -4. **Potential operators/contributors** - External readers who want to help operate, modify, or answer questions about BlumeOps, or onboard as a member. - -5. **Replicators** - People who want to duplicate this approach for their own personal infrastructure operations. - -### Requirements - -- **Source format**: Markdown with optional YAML frontmatter -- **Editing**: Compatible with [Obsidian](https://obsidian.md) and [obsidian.nvim](https://github.com/obsidian-nvim/obsidian.nvim) -- **Cross-references**: Wiki-link syntax (`[[link]]`) for internal links -- **Output formats**: HTML (for web hosting) and PDF (for offline reference) -- **Changelog**: Track significant documentation changes - -### Tooling - -**Selected**: [Quartz](https://quartz.jzhao.xyz/) - A TypeScript-based static site generator designed for Obsidian vaults. Features wiki-link support, backlinks, graph view, and excellent Obsidian compatibility. - -**Architecture**: -- **Source**: Markdown files in `docs/` with optional YAML frontmatter -- **Build**: Quartz builds static HTML/CSS/JS via Forgejo workflow -- **Release**: Built assets published as Forgejo release attachments -- **Hosting**: `quartz` container downloads release bundle on startup and serves via nginx -- **URL**: `docs.ops.eblu.me` (planned) - -## Restructuring Phases - -### Phase 1a: Foundation & CI (Complete) -- [x] Move existing zk cards to `docs/zk/` -- [x] Update `zk-docs` mise task for new path -- [x] Create this README with restructuring plan -- [x] Select documentation tooling (Quartz) -- [x] Create Quartz configuration (`quartz.config.ts`, `quartz.layout.ts`) -- [x] Create `quartz` container for serving static sites -- [x] Create `build-blumeops` workflow for building releases -- [x] Test the build workflow and verify release creation (v1.0.0) - -**First release:** [v1.0.0](https://forge.ops.eblu.me/eblume/blumeops/releases/tag/v1.0.0) - -### Phase 1b: CD & Hosting (Complete) -- [x] Build and tag `quartz` container (`mise run container-tag-and-release quartz v1.0.0`) -- [x] Create ArgoCD manifests for `docs` deployment -- [x] Add `docs.ops.eblu.me` to Caddy reverse proxy -- [x] Configure deployment with `DOCS_RELEASE_URL` -- [x] Test end-to-end: commit -> build -> release -> deploy -- [x] Set up `CHANGELOG.md` with [towncrier](https://towncrier.readthedocs.io/) -- [x] Add `docs.ops.eblu.me` link to homepage dashboard (via gethomepage.dev annotations) - -**Docs URL:** https://docs.ops.eblu.me - -### Phase 2: Reference (Complete) -Information-oriented technical descriptions. Built first so other docs can link to reference material. - -- [x] Create `reference/` directory with index -- [x] Service reference pages (16 services: alloy, argocd, borgmatic, 1password, forgejo, grafana, jellyfin, kiwix, loki, miniflux, navidrome, postgresql, prometheus, teslamate, transmission, zot) -- [x] Infrastructure inventory (hosts, tailscale, routing) -- [x] Kubernetes reference (cluster, apps) -- [x] Storage reference (sifaka, backups) - -**Reference URL:** https://docs.ops.eblu.me/reference/ - -### Phase 3: Tutorials (Complete) -Learning-oriented content for getting started. Each tutorial explicitly identifies its target audiences. - -- [x] Create `tutorials/` directory with index -- [x] "Exploring the Docs" - How to navigate documentation (All) -- [x] "AI Assistance Guide" - Context for AI-assisted operations (AI, Owner) -- [x] "Contributing" - Your first contribution (Contributor) -- [x] "Adding a Service" - Deploy a new ArgoCD service (Contributor, Replicator) -- [x] "Replicating BlumeOps" - Overview for building similar setup (Replicator) -- [x] Replication sub-tutorials: - - [x] Tailscale Setup - - [x] Core Services (Forgejo, Zot) - - [x] Kubernetes Bootstrap - - [x] ArgoCD Config - - [x] Observability Stack -- [x] New reference cards: docs service, tailscale-operator, ansible/roles - -**Tutorials URL:** https://docs.ops.eblu.me/tutorials/ - -### Phase 4: How-to Guides (Complete) -Task-oriented instructions for specific operations. - -- [x] Create `how-to/` directory -- [x] Migrate operational content from zk cards -- [x] "How to deploy a new Kubernetes service" -- [x] "How to add a new Ansible role" -- [x] "How to update Tailscale ACLs" -- [x] "How to troubleshoot common issues" -- [x] Update `exploring-the-docs` with How-to section - -**How-to URL:** https://docs.ops.eblu.me/how-to/ - -### Phase 5: Explanation (Complete) -Understanding-oriented discussion of concepts and decisions. - -- [x] Create `explanation/` directory -- [x] "Why GitOps?" - Philosophy and approach -- [x] "Architecture Overview" - How everything fits together -- [x] "Security Model" - Tailscale, secrets management, etc. -- [ ] "Decision Log" - ADRs (Architecture Decision Records) - deferred -- [x] Update `exploring-the-docs` with Explanation section - -**Explanation URL:** https://docs.ops.eblu.me/explanation/ - -### Phase 6: Integration & Cleanup -- [ ] Migrate remaining useful content from `docs/zk/` -- [ ] Decide fate of zk cards (archive, delete, or keep as separate knowledge base) -- [ ] Update CLAUDE.md to reference new doc structure -- [ ] Final review of `exploring-the-docs` for completeness -- [ ] Mirror docs to GitHub Pages for public access (optional) - -## Current Directory Layout - -``` -docs/ -├── README.md # This file -├── CHANGELOG.md # Release changelog (built by towncrier) -├── changelog.d/ # Towncrier news fragments -├── reference/ # Information-oriented (Phase 2) -├── tutorials/ # Learning-oriented (Phase 3) -├── how-to/ # Task-oriented (Phase 4) -├── explanation/ # Understanding-oriented (Phase 5) -└── zk/ # Zettelkasten cards (temporary) - ├── 1767747119-YCPO.md # Main blumeops overview card - └── ... # Service-specific cards and notes -``` - -> **Why Reference first?** Reference docs are built before tutorials and how-to guides so that learning and task-oriented content can link to authoritative technical descriptions using wiki-links (`[[reference/service-name]]`). - -## Adding Changelog Entries - -When making changes, add a news fragment to `docs/changelog.d/`: - -```bash -# Format: ..md -# Types: feature, bugfix, infra, doc, misc -echo "Add new feature X" > docs/changelog.d/20260203-feature-x.feature.md -``` - -Fragments are automatically collected into CHANGELOG.md when a release is built. - -## Viewing the ZK Cards - -To view all BlumeOps zettelkasten cards: - -```fish -mise run zk-docs -``` - -This displays all cards tagged with `blumeops`, starting with the main overview card. diff --git a/docs/changelog.d/phase6-cleanup.doc.md b/docs/changelog.d/phase6-cleanup.doc.md new file mode 100644 index 0000000..055e0c4 --- /dev/null +++ b/docs/changelog.d/phase6-cleanup.doc.md @@ -0,0 +1 @@ +Complete Phase 6: migrate zk content, delete legacy cards, rewrite zk-docs for AI context priming diff --git a/docs/how-to/index.md b/docs/how-to/index.md index 6f1a29a..4f5ab11 100644 --- a/docs/how-to/index.md +++ b/docs/how-to/index.md @@ -20,6 +20,7 @@ Task-oriented instructions for common BlumeOps operations. These guides assume y | Guide | Description | |-------|-------------| | [[update-tailscale-acls]] | Update Tailscale access control policies | +| [[use-pypi-proxy]] | Configure pip and publish packages to devpi | ## Documentation diff --git a/docs/how-to/use-pypi-proxy.md b/docs/how-to/use-pypi-proxy.md new file mode 100644 index 0000000..b96ad46 --- /dev/null +++ b/docs/how-to/use-pypi-proxy.md @@ -0,0 +1,61 @@ +--- +title: use-pypi-proxy +tags: + - how-to + - python +--- + +# Use the PyPI Proxy + +How to configure clients and publish packages to [[devpi]]. + +## Configure pip + +Create `~/.config/pip/pip.conf`: + +```ini +[global] +index-url = https://pypi.ops.eblu.me/root/pypi/+simple/ +trusted-host = pypi.ops.eblu.me +``` + +Track with chezmoi: +```bash +chezmoi add ~/.config/pip/pip.conf +``` + +## Upload Packages + +```bash +# Build and publish with uv +cd ~/code/personal/your-package +uv build +uv publish --publish-url https://pypi.ops.eblu.me/eblume/dev/ + +# First time: uv will prompt for credentials +``` + +## Create Users/Indices + +```bash +# Login as root +uvx devpi use https://pypi.ops.eblu.me +uvx devpi login root + +# Create user (prompts for password - store in 1Password) +uvx devpi user -c USERNAME email=EMAIL + +# Create index inheriting from PyPI mirror +uvx devpi index -c USERNAME/dev bases=root/pypi +``` + +## Verify Cache + +```bash +# Check if devpi is caching +curl -s https://pypi.ops.eblu.me/+api | jq +``` + +## Related + +- [[devpi]] - Service reference diff --git a/docs/index.md b/docs/index.md index 9b66ba2..0e71774 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,7 +12,3 @@ Welcome to the BlumeOps documentation. - [[reference/index | Reference]] - Technical specifications and service details - [[how-to/index | How-to]] - Task-oriented instructions for common operations - [[explanation/index | Explanation]] - Understanding the "why" behind BlumeOps - -## About - -[[README | Documentation Home]] - Restructuring plan and changelog info diff --git a/docs/reference/index.md b/docs/reference/index.md index 8900424..8c0ed62 100644 --- a/docs/reference/index.md +++ b/docs/reference/index.md @@ -31,6 +31,7 @@ Individual service reference cards with URLs and configuration details. | [[teslamate]] | Tesla data logger | k8s | | [[transmission]] | BitTorrent daemon | k8s | | [[zot]] | Container registry | indri | +| [[devpi]] | PyPI caching proxy | k8s | | [[docs]] | Documentation site (Quartz) | k8s | ## Infrastructure diff --git a/docs/reference/infrastructure/indri.md b/docs/reference/infrastructure/indri.md index 055b62f..6351585 100644 --- a/docs/reference/infrastructure/indri.md +++ b/docs/reference/infrastructure/indri.md @@ -32,6 +32,12 @@ Primary BlumeOps server. Mac Mini M1 (2020). **Kubernetes (via minikube):** - [[apps | All k8s applications]] +## Maintenance Notes + +**Sleep prevention:** Uses Amphetamine (App Store) to prevent sleep. If Amphetamine crashes after extended uptime, consider switching to `pmset` or `caffeinate` via ansible. + +**Passwordless sudo:** Configured for `erichblume` user (`/etc/sudoers.d/erichblume`) to allow ansible `become: true` without prompts. Acceptable given Tailscale is the trust boundary. + ## Related - [[routing | Routing]] - Port mappings diff --git a/docs/reference/services/borgmatic.md b/docs/reference/services/borgmatic.md index b4b9a95..b0c8bc5 100644 --- a/docs/reference/services/borgmatic.md +++ b/docs/reference/services/borgmatic.md @@ -25,7 +25,9 @@ Daily backup system using Borg backup, running on indri. - `/opt/homebrew/var/forgejo` - Git forge data - `~/.config/borgmatic` - Borgmatic config - `~/Documents` - Personal documents -- `~/Pictures` - Photos +- `~/Pictures` - Photos (see note below) + +**iCloud Photos note:** macOS Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally. Borgmatic only backs up what's on disk, so iCloud-only photos are NOT backed up via this method. **Databases:** - `miniflux` on [[postgresql]] diff --git a/docs/reference/services/devpi.md b/docs/reference/services/devpi.md new file mode 100644 index 0000000..9c20c32 --- /dev/null +++ b/docs/reference/services/devpi.md @@ -0,0 +1,37 @@ +--- +title: devpi +tags: + - service + - python +--- + +# devpi (PyPI Proxy) + +PyPI caching proxy and private package index. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **URL** | https://pypi.ops.eblu.me | +| **Namespace** | `devpi` | +| **ArgoCD App** | `devpi` | +| **Storage** | 50Gi PVC | +| **Image** | `registry.ops.eblu.me/blumeops/devpi:latest` | + +## Indices + +| Index | Purpose | +|-------|---------| +| `root/pypi` | PyPI mirror/cache (auto-created) | +| `eblume/dev` | Private packages (inherits from root/pypi) | + +## Credentials + +Root password stored in 1Password (blumeops vault), injected via ExternalSecret. + +## Related + +- [[how-to/use-pypi-proxy]] - Client configuration and package uploads +- [[argocd]] - Deployment +- [[1password]] - Secrets management diff --git a/docs/tutorials/exploring-the-docs.md b/docs/tutorials/exploring-the-docs.md index 113c26b..1e356ec 100644 --- a/docs/tutorials/exploring-the-docs.md +++ b/docs/tutorials/exploring-the-docs.md @@ -29,8 +29,8 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework: You probably want quick access to operational details: - [[how-to/index|How-to guides]] for common operations (deploy, troubleshoot, update ACLs) - [[reference/index|Reference]] has service URLs, commands, and config locations -- The `zk-docs` mise task still works for legacy zettelkasten access - [[ai-assistance-guide]] explains how to work effectively with Claude +- Run `mise run zk-docs` to prime AI context with key documentation ### For Claude/AI Agents @@ -73,11 +73,12 @@ When reading on the web (docs.ops.eblu.me), these render as clickable links. The Pre-commit hooks automatically validate that all wiki-links point to existing files and that link targets are unambiguous. -## Legacy Content +## AI Context Priming -The `docs/zk/` directory contains zettelkasten cards from before the restructuring. These are read-only reference - new content goes in the structured sections. The cards will eventually be migrated or archived. +The `zk-docs` mise task concatenates key documentation files for AI context: -To view legacy cards: ```bash mise run zk-docs ``` + +This outputs the AI assistance guide, reference index, how-to index, architecture overview, and tutorials index - providing Claude with essential context for BlumeOps operations. diff --git a/docs/zk/1767747119-YCPO.md b/docs/zk/1767747119-YCPO.md deleted file mode 100644 index 771f485..0000000 --- a/docs/zk/1767747119-YCPO.md +++ /dev/null @@ -1,244 +0,0 @@ ---- -id: 1767747119-YCPO -tags: -- blumeops ---- - -BlumeOps, aka Blue Mops, refers to my own personal computing operations stack. - -Source code: https://forge.ops.eblu.me/eblume/blumeops (mirrored to https://github.com/eblume/blumeops) - -# Infrastructure - -| Host | Description | Notes | -|----------------------------------|--------------------------|----------------------------------------------------| -| **[[indri|Indri]]** | Mac Mini M1, 2020 | Primary server, 2TB internal disk | -| **[Sifaka](https://nas.ops.eblu.me)** | Synology NAS | 10.9TB RAID 5, backup target | -| **Gilbert** | 13" MacBook Air M4, 2025 | Primary workstation | -| **Mouse** | 13" MacBook Air M2 | Allison's laptop | -| **[UniFi](https://192.168.1.1)** | UniFi Express 7 | Home WiFi network ([cloud](https://unifi.ui.com)) | -| **Dwarf** | iPad Air | Employer-provided, off tailnet | - -All devices are connected via [Tailscale](https://login.tailscale.com/) tailnet `tail8d86e.ts.net`. - -## Tailscale Access Control - -ACLs are managed via Pulumi in `pulumi/policy.hujson`. See [[pulumi]] for deployment commands. - -**Important lesson learned:** -- Don't tag user-owned devices (like gilbert) - tagging converts them to "tagged devices" which lose user identity and break user-based SSH rules - -### Groups - -| Group | Members | Purpose | -|---------------------|--------------------------------------------|----------------------------------| -| `group:allisonflix` | , | Jellyfin media access | - -### Device Tags - -| Tag | Devices | Purpose | -|------------------|-------------|--------------------------------------------| -| `tag:homelab` | indri | Server infrastructure | -| `tag:nas` | sifaka | Network-attached storage for backups | -| `tag:blumeops` | indri, sifaka | Resources managed by Pulumi IaC | -| `tag:registry` | indri | Container registry access | -| `tag:k8s-api` | indri | Kubernetes API server access | - -### Access Matrix - -| Source | Kiwix | Forge | PyPI | Miniflux | PostgreSQL | NAS | Grafana | Loki | -|--------------------------|-------|-------|------|----------|------------|-----|---------|------| -| `autogroup:admin` | Y | Y | Y | Y | Y | Y | Y | Y | -| `autogroup:member` | Y | Y | Y | Y | Y | - | - | - | -| `tag:homelab` | - | - | - | - | - | Y | - | - | - -Notes: -- **Admins** - full access to all services via `autogroup:admin` -- **Allison** (member) - member services only, no Grafana/Loki/NAS - -### SSH Access - -| Source | Destinations | Auth | -|-------------------------|-----------------|-------------| -| `autogroup:member` | `autogroup:self`| check | -| `autogroup:admin` | `tag:homelab` | check (12h) | -| `autogroup:admin` | `tag:nas` | check (12h) | - -# Services - -Services are accessible via two DNS domains: -- **`*.ops.eblu.me`** - Caddy reverse proxy (reachable from k8s pods, docker containers, and tailnet) -- **`*.tail8d86e.ts.net`** - Tailscale MagicDNS (tailnet clients only, not from k8s/docker) - -## Caddy Services (`*.ops.eblu.me`) - -Caddy proxies to k8s services via their Tailscale endpoints (traffic stays local on indri). -Both `*.ops.eblu.me` and `*.tail8d86e.ts.net` URLs work - use ops.eblu.me for access from pods/containers. - -| Service | URL | Description | Management Log | -|----------------|-----------------------------------|------------------------------------|-----------------| -| **Homepage** | https://go.ops.eblu.me | Service dashboard / start page | — | -| **Forgejo** | https://forge.ops.eblu.me | Git hosting (SSH: port 2222) | [[forgejo]] | -| **Registry** | https://registry.ops.eblu.me | OCI container registry (Zot) | [[zot]] | -| **Sifaka NAS** | https://nas.ops.eblu.me | Synology NAS dashboard | — | -| **Grafana** | https://grafana.ops.eblu.me | Dashboards & observability (k8s) | [[grafana]] | -| **ArgoCD** | https://argocd.ops.eblu.me | GitOps continuous delivery (k8s) | [[argocd]] | -| **Prometheus** | https://prometheus.ops.eblu.me | Metrics collection (k8s) | [[prometheus]] | -| **Loki** | https://loki.ops.eblu.me | Log aggregation (k8s) | [[loki]] | -| **Miniflux** | https://feed.ops.eblu.me | RSS/Atom feed reader (k8s) | [[miniflux]] | -| **PyPI** | https://pypi.ops.eblu.me | PyPI caching proxy (devpi, k8s) | [[pypi]] | -| **Kiwix** | https://kiwix.ops.eblu.me | Offline Wikipedia & ZIM (k8s) | [[argocd]] | -| **Torrent** | https://torrent.ops.eblu.me | BitTorrent daemon web UI (k8s) | [[argocd]] | -| **TeslaMate** | https://tesla.ops.eblu.me | Tesla data logger (k8s) | [[teslamate]] | -| **Immich** | https://photos.ops.eblu.me | Photo management (k8s Helm, CNPG) | [[argocd]] | -| **DJ** | https://dj.ops.eblu.me | Music streaming server (Navidrome) | [[navidrome]] | -| **PostgreSQL** | pg.ops.eblu.me:5432 | Database server (k8s CloudNativePG)| [[postgresql]] | - -## Tailscale-Only Services (`*.tail8d86e.ts.net`) - -These services are only accessible via Tailscale (not from k8s pods/containers): - -| Service | URL | Description | Management Log | -|----------------|-----------------------------------|------------------------------------|-----------------| -| **Kubernetes** | https://k8s.tail8d86e.ts.net | Minikube API (TCP passthrough) | [[minikube]] | -| **Jellyfin** | https://jellyfin.ops.eblu.me | Media server (VideoToolbox HW) | [[jellyfin]] | - -Supporting services (not directly user-facing): - -| Service | Description | Management Log | -|---------------------|---------------------------------------|------------------| -| **Alloy (indri)** | Metrics & logs collector (indri host) | [[alloy]] | -| **Alloy (k8s)** | Pod log collection & service probes | [[alloy]] | -| **Kube-state-metrics** | K8s resource metrics (pods, deployments) | [[prometheus]] | -| **Borgmatic** | Daily backups to Sifaka NAS (2:00 AM) | [[borgmatic]] | - -## Port Map (Indri) - -| Port | Service | Protocol | Binding | Notes | -|-------|---------------|----------|-------------|--------------------------------------------| -| 443 | Caddy | HTTPS | 0.0.0.0 | Reverse proxy for `*.ops.eblu.me` | -| 2222 | Caddy L4 | TCP | 0.0.0.0 | SSH proxy → Forgejo (localhost:2200) | -| 5432 | Caddy L4 | TCP | 0.0.0.0 | PostgreSQL proxy → k8s pg | -| 2200 | Forgejo SSH | TCP | localhost | Built-in SSH server | -| 3001 | Forgejo | HTTP | localhost | Web UI (proxied by Caddy) | -| 5050 | Zot | HTTP | localhost | Registry API (proxied by Caddy) | -| 8096 | Jellyfin | HTTP | localhost | Media server (proxied by Caddy) | -| 44491 | K8s API | HTTPS | 0.0.0.0 | Minikube API server (via Tailscale k8s.*) | - -# Service Management - -## Pulumi (Tailnet IaC) - -Tailnet-wide configuration (ACLs, tags, DNS) is managed via Pulumi. See [[pulumi]] for details. - -```bash -mise run tailnet-preview # preview ACL changes -mise run tailnet-up # apply ACL changes -``` - -Edit `pulumi/policy.hujson` to modify ACLs or add new tags. - -## Ansible - -Services on Indri are managed via ansible. Playbooks live in the `ansible/` directory of the blumeops repo: - -```bash -mise run provision-indri # runs ansible/playbooks/indri.yml -mise run indri-services-check # checks health of all services -``` - -Run with `--check --diff` first to preview changes, or target specific services: - -```bash -mise run provision-indri -- --check --diff # dry run -mise run provision-indri -- --tags alloy # only alloy -mise run provision-indri -- --tags zot,borgmatic # multiple tags -``` - -## Adding a New Service - -### Indri Services (via Caddy) - -For services running directly on indri that need to be accessible from k8s pods: - -1. Host service locally on localhost (e.g., localhost:3000) -2. Add service to `ansible/roles/caddy/defaults/main.yml` under `caddy_services` -3. Run `mise run provision-indri -- --tags caddy` -4. Add backup entry in borgmatic role if needed - -DNS is handled by a wildcard record (`*.ops.eblu.me` → indri's Tailscale IP) managed via Pulumi in `pulumi/gandi/`. - -Access via `https://foo.ops.eblu.me`. - -### K8s Services (via Tailscale Ingress) - -For services running in minikube: - -1. Create Kubernetes manifests in `argocd/manifests//` -2. Add ArgoCD Application in `argocd/apps/.yaml` -3. Add Tailscale Ingress annotation for `*.tail8d86e.ts.net` hostname -4. Add Homepage annotations to the Ingress for dashboard discovery (see below) -5. Add Caddy proxy entry in `ansible/roles/caddy/defaults/main.yml` -6. Sync via ArgoCD: `argocd app sync ` - -Access via `https://foo.ops.eblu.me` (preferred) or `https://foo.tail8d86e.ts.net`. - -**Note:** K8s services using Tailscale Ingress are NOT accessible from other k8s pods or docker containers. Use Caddy (`*.ops.eblu.me`) if pod-to-service communication is needed. - -**Homepage annotations** for automatic dashboard discovery: -```yaml -annotations: - gethomepage.dev/enabled: "true" - gethomepage.dev/name: "My Service" - gethomepage.dev/group: "Apps" - gethomepage.dev/icon: "myservice.png" - gethomepage.dev/description: "Short description" - gethomepage.dev/href: "https://myservice.ops.eblu.me" - gethomepage.dev/pod-selector: "app=myservice" -``` - -Icons use [Dashboard Icons](https://github.com/walkxcode/dashboard-icons) format (e.g., `grafana.png`, `prometheus.png`). The `pod-selector` annotation enables pod status badges on the dashboard. - -## Secrets Management - -Kubernetes secrets are managed via [[external-secrets|External Secrets Operator]], which syncs from 1Password via 1Password Connect. - -To add a secret to a k8s service: -1. Ensure the 1Password item exists in the `blumeops` vault -2. Create an `ExternalSecret` manifest in the service's directory -3. Reference the `onepassword-blumeops` ClusterSecretStore -4. Sync via ArgoCD - -See [[external-secrets]] for detailed usage and bootstrap instructions. - -# Notes - -## Go DNS Resolution on macOS - -**Important lesson learned (2026-01-22):** -Go programs built with `CGO_ENABLED=0` (pure Go) use a DNS resolver that reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files. This breaks Tailscale MagicDNS resolution. - -**Solution:** Build Go programs with `CGO_ENABLED=1` to use the macOS native resolver. This is why [[alloy|Grafana Alloy]] is built from source rather than using the Homebrew bottle. - -## Remote Kubernetes Access (from Gilbert) - -The minikube cluster on indri is accessible from gilbert via Tailscale service. -Cluster was created with `--apiserver-names=k8s.tail8d86e.ts.net,indri --listen-address=0.0.0.0`. -API server exposed at `https://k8s.tail8d86e.ts.net` via TCP passthrough (preserves mTLS). - -**Fish abbreviations** (in `~/.config/fish/config.fish`): -- `ki` -> `kubectl --context=minikube-indri` -- `k9i` -> `k9s --context=minikube-indri` -- `k9` -> `k9s` - -```bash -# Quick access via abbreviations -ki get nodes -k9i - -# Or explicitly set context -kubectl config use-context minikube-indri -kubectl get nodes -``` - -Credentials are stored in 1Password and fetched via exec credential plugin. See [[minikube]] for details. diff --git a/docs/zk/1768246525-RVRY.md b/docs/zk/1768246525-RVRY.md deleted file mode 100644 index b2bca43..0000000 --- a/docs/zk/1768246525-RVRY.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -id: 1768246525-RVRY -tags: -- blumeops -- forgejo -- git -- scm -- forge ---- - -# Mon Jan 12 11:35 - -```fish -❯ brew install forgejo -❯ brew --prefix forgejo -/opt/homebrew/opt/forgejo -❯ brew services start forgejo -==> Successfully started `forgejo` (label: homebrew.mxcl.forgejo) -``` - -From the service definition I can see that this runs as: - -```bash -/opt/homebrew/opt/forgejo/bin/forgejo web --work-path /opt/homebrew/var/forgejo > /opt/homebrew/var/log/forgejo.log 2> /opt/homebrew/var/log/forgejo.log -``` -It sounds from the docs like this means the config file should live at: -``` -/opt/homebrew/var/forgejo/custom/conf/app.ini -``` -Ah, based on the logs, it looks like forgejo has picked port 3000 which is used by grafana: -``` -❯ lsof -nP -iTCP:3000 -sTCP:LISTEN -COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME -grafana 1530 erichblume 15u IPv6 0x4acfad8b21dcb063 0t0 TCP *:3000 (LISTEN) -``` -Ok I've set a basic config for port 3001, and then gone through the basic app setup. Looks like it's working! Not sure how SSH works yet though. Let's get this service registered. - -Ok so the next issue is that I want to use ssh as my primary git interface, and -I want that to look to users like I'm using port 22 but I want to host it on -indri which has its own separate ssh setup. Hmm. Let's tell forgejo to use port -2200. Ah perfect, we can set SSH_PORT to 22 and SSH_LISTEN_PORT to 2200. - -Hmm, let's stop running this as me and run as a new user, 'forgejo'. -```fish -sudo sysadminctl -addUser forgejo -system -shell /usr/bin/false -sudo chown -R forgejo:staff /opt/homebrew/var/forgejo -``` -Ok, I think I need to switch all my services on this host over to a services file. - -Wow, missing from the above is like 4 hours of deep diving in to the particulars of tailscale service definition hosting. In the end, I never got a services file to work - and yes, I did remember to advertise! Adding to the complexity is that I didn't discover until the end that you can't do "hairpinning", ie you CANNOT use the tailnet service name from the host doing that hosting. I probably had it fixed at some point hours ago and ruled it out because I didn't know about the hairpinning issue. So anyway... what ended up working was to just use the cli: -```fish -tailscale serve --service="svc:forge" --tcp=22 tcp://localhost:2200 -tailscale serve --service="svc:forge" --https=443 http://localhost:3001 -``` -That's it. Nothing else needed, worked right away. Sheesh. (Ok there was also a -solid hour spent on permission issues... I honestly don't know how it's working -now, as there is now a `forgejo` user and the config says to use it but the -files are all owned by `erichblume:staff` but with group permissions set... in -any case, it friggin' works. So I'm happy. - -# Configuration (Ansible-Managed) - -As of 2026-01-23, the `app.ini` is managed by ansible: -- Template: `ansible/roles/forgejo/templates/app.ini.j2` -- Secrets fetched from 1Password in playbook pre_tasks -- Secrets item: "Forgejo Secrets" in blumeops vault (fields: `lfs-jwt-secret`, `internal-token`, `oauth2-jwt-secret`, `runner_reg`) - -Deploy config changes: -```bash -mise run provision-indri -- --tags forgejo -``` - -# Forgejo Actions (CI/CD) - -## Runner (k8s) - -The Forgejo runner runs in Kubernetes with Docker-in-Docker (DinD) for container builds. - -**Architecture:** -- Runner daemon + DinD sidecar in a single pod -- Jobs execute in containers using the `k8s` label -- DinD exposes Docker API on `tcp://127.0.0.1:2375` -- Pods reach `*.ops.eblu.me` services via Caddy reverse proxy - -**Components:** -- ArgoCD app: `argocd/apps/forgejo-runner.yaml` -- Manifests: `argocd/manifests/forgejo-runner/` -- Job image: `registry.ops.eblu.me/blumeops/forgejo-runner` (Node.js + Docker CLI) -- Job image source: `containers/forgejo-runner/` - -**Deployment:** -```bash -# Apply secret (contains runner token from 1Password) -op inject -i argocd/manifests/forgejo-runner/secret.yaml.tpl | kubectl --context=minikube-indri apply -f - - -# Sync via ArgoCD -argocd app sync forgejo-runner -``` - -**View logs:** -```bash -kubectl --context=minikube-indri logs -n forgejo-runner -l app=forgejo-runner -c runner -``` - -## Container Build Workflow - -Container images are built via `.forgejo/workflows/build-container.yaml`, triggered by tags matching `-v`. - -**Release a container:** -```bash -mise run container-list # See available containers -mise run container-tag-and-release nettest v1.0.0 # Tag and trigger build -``` - -**Test container** (`containers/nettest/`): Network connectivity test for debugging CI/CD. - -## Workflows - -Workflows live in `.forgejo/workflows/` (not `.github/workflows/`). - -**Important**: Use `github.*` context variables, not `gitea.*`. Forgejo supports both at runtime, but: -1. The Forgejo web UI schema validator only recognizes `github.*` -2. `actionlint` pre-commit hook validates workflows locally (catches errors before push) -3. Pass untrusted inputs (like `github.head_ref`) through env vars for security - -## Runner Token - -Stored in 1Password "Forgejo Secrets" item, field `runner_reg`. - -To create a new token: -1. Go to https://forge.ops.eblu.me/admin/actions/runners -2. Click "Create new Runner" -3. Copy the token and update 1Password diff --git a/docs/zk/1768283761-TRXN.md b/docs/zk/1768283761-TRXN.md deleted file mode 100644 index 3116c5a..0000000 --- a/docs/zk/1768283761-TRXN.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -id: 1768283761-TRXN -tags: -- blumeops ---- - -# Prometheus Management Log - -Prometheus provides metrics storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure, running in Kubernetes (minikube on indri). - -## Service Details - -- URL: https://prometheus.tail8d86e.ts.net -- Namespace: `monitoring` -- Image: `prom/prometheus:v3.2.1` -- ArgoCD app: `prometheus` -- Storage: 50Gi PVC - -## Data Sources - -### Remote Write (from Alloy) -- Indri system metrics via [[alloy|Grafana Alloy]] remote_write -- Textfile metrics: minikube, borgmatic, zot, jellyfin - -### Scrape Targets -- `sifaka:9100` - Synology NAS (node_exporter in Docker) -- `cnpg-metrics.tail8d86e.ts.net:9187` - CloudNativePG PostgreSQL metrics -- `kube-state-metrics.monitoring.svc:8080` - Kubernetes resource metrics (pods, deployments, etc.) - -## Useful Commands - -```bash -# View logs -kubectl --context=minikube-indri -n monitoring logs -f prometheus-0 - -# Check targets -curl -s https://prometheus.tail8d86e.ts.net/api/v1/targets | jq '.data.activeTargets[].scrapeUrl' - -# Sync from ArgoCD -argocd app sync prometheus -``` - -## ArgoCD Management - -Prometheus is deployed via ArgoCD from `argocd/manifests/prometheus/`: -- `statefulset.yaml` - StatefulSet with 50Gi PVC -- `configmap.yaml` - Prometheus configuration -- `service.yaml` - ClusterIP service -- `ingress-tailscale.yaml` - Tailscale Ingress - -## Log - -### Wed Jan 22 2026 (observability cleanup) - -- Added kube-state-metrics scrape target for k8s resource metrics -- Enhanced Minikube dashboard with namespace filtering and resource usage panels -- Uses `kube_pod_info`, `kube_pod_container_resource_requests`, etc. - -### Wed Jan 22 2026 (later) - -- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet -- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` -- Remote write endpoint now at k8s service, Alloy updated to push there -- Retired ansible prometheus role from indri -- Added ACL grant for `tag:homelab` → `tag:k8s` on port 443 for Alloy access - -### Wed Jan 22 2026 - -Added CNPG PostgreSQL metrics scraping. The CloudNativePG operator exposes Prometheus metrics on port 9187. Exposed via Tailscale at `cnpg-metrics.tail8d86e.ts.net:9187` and added to scrape_configs as job `cnpg-postgres`. - -### Wed Jan 15 2026 - -Prometheus now accepts metrics via remote_write from [[alloy|Grafana Alloy]]. The `--web.enable-remote-write-receiver` flag was added to enable this. - -Indri metrics are no longer scraped - they're pushed by Alloy. Sifaka still uses traditional scraping via node_exporter running in Docker on the Synology. - -### Mon Jan 13 2026 - -Prometheus is now managed via ansible in [[1767747119-YCPO|blumeops]]. Configuration files are templated from the ansible role. - -### Mon Jan 12 2026 21:56 - -Prometheus was stood up about a week ago at this point. I am currently renaming -`localhost` to `indri` in the scrape_configs. While I'm here I'm going to see -if I can add Synology stats. - -I'm adding Container Manager to Sifaka now. I should probably have a Sifaka -management log, but not yet. Downloaded prom/node-exporter and made a container -for it. Using the latest tag because I'm nasty. - -Done. Adding to scrape configs. - -Ok, it didn't like the indri hostname. Could probably fix somehow with either magicdns or /etc/hosts but for now, I'm using `relabel_configs`. This is working. Gotta go to bed. diff --git a/docs/zk/1768457769-LOCK.md b/docs/zk/1768457769-LOCK.md deleted file mode 100644 index f8eb8cd..0000000 --- a/docs/zk/1768457769-LOCK.md +++ /dev/null @@ -1,146 +0,0 @@ ---- -id: 1768457769-LOCK -tags: -- blumeops ---- - -# PyPI / devpi Management Log - -PyPI caching proxy running in Kubernetes (minikube on indri) via devpi-server. - -## Service Details - -- URL: https://pypi.tail8d86e.ts.net -- Namespace: devpi -- Image: registry.tail8d86e.ts.net/blumeops/devpi:latest (custom image with devpi-server + devpi-web) -- ArgoCD app: devpi -- Storage: 50Gi PVC - -## Useful Commands - -```bash -# View logs -kubectl --context=minikube-indri -n devpi logs -f statefulset/devpi - -# Restart pod -kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi - -# Check health -curl https://pypi.tail8d86e.ts.net/+api - -# Sync from ArgoCD -argocd app sync devpi -``` - -## ArgoCD Management - -Devpi is deployed via ArgoCD from `argocd/manifests/devpi/`: -- `statefulset.yaml` - StatefulSet with 50Gi PVC -- `service.yaml` - ClusterIP service -- `ingress-tailscale.yaml` - Tailscale Ingress for external access -- `Dockerfile` - Custom image with startup script -- `start.sh` - Auto-initialization script - -## Users and Indices - -### Structure - -- `root/pypi` - PyPI mirror/cache (auto-created) -- `eblume/dev` - Private packages index (inherits from root/pypi) - -### Creating a User and Index - -```bash -# Login as root -uvx devpi use https://pypi.tail8d86e.ts.net -uvx devpi login root - -# Create user (prompts for password - store in 1Password) -uvx devpi user -c USERNAME email=EMAIL - -# Create index inheriting from PyPI mirror -uvx devpi index -c USERNAME/dev bases=root/pypi -``` - -### Uploading Packages (with uv) - -```bash -# Store credentials (one-time, prompts for username/password) -uv auth login https://pypi.tail8d86e.ts.net - -# Build and publish -cd ~/code/personal/your-package -uv build -uv publish --publish-url https://pypi.tail8d86e.ts.net/eblume/dev/ -``` - -Note: The "trusted publishing failed" warning is expected (devpi doesn't support OIDC). - -### Uploading Packages (with devpi-client) - -```bash -# Login as the user -uvx devpi login USERNAME - -# Use the index -uvx devpi use eblume/dev - -# Upload from project directory -uvx devpi upload -``` - -## Client Configuration - -On workstations, configure pip to use the proxy. - -**pip.conf** (`~/.config/pip/pip.conf`): -```ini -[global] -index-url = https://pypi.tail8d86e.ts.net/root/pypi/+simple/ -trusted-host = pypi.tail8d86e.ts.net -``` - -After creating/editing, track with chezmoi: -```bash -chezmoi add ~/.config/pip/pip.conf -``` - -## Credentials - -- Root password stored in 1Password (blumeops vault) -- Injected into k8s via `devpi-root` secret from `secret-root.yaml.tpl` - -## Backup - -Private packages (`eblume/dev` index) are stored in the devpi PVC. The PyPI mirror cache (`root/pypi`) is not backed up as it can be re-fetched. - -**TODO**: Add devpi PVC backup to borgmatic once k8s volume backup strategy is established. - -## Related - -- [[1767747119-YCPO|BlumeOps project card]] -- [[argocd|ArgoCD]] for deployment -- [[minikube|Kubernetes cluster]] - -## Log - -### Mon Jan 20 2026 - -- **Migrated to Kubernetes** (Phase 5 of k8s migration) -- Custom container image with devpi-server + devpi-web + auto-init startup script -- StatefulSet with 50Gi PVC for data persistence -- Tailscale Ingress at `pypi.tail8d86e.ts.net` -- Root password from 1Password secret, auto-initialized on first run -- Verified pip caching proxy and mcquack package upload -- **Key learnings:** - - Minikube CRI-O can't resolve Tailscale hostnames - added registry mirror config - - devpi-web Whoosh indexer needs ~2Gi during initial PyPI index build - - Kubernetes auto-sets `DEVPI_PORT` for service discovery - renamed to `DEVPI_LISTEN_PORT` -- Removed LaunchAgent from indri, cleared Tailscale serve entry - -### Previous (indri era) - -- Initial setup with devpi on indri via mcquack LaunchAgent -- Connected via Tailscale at pypi.tail8d86e.ts.net -- Created eblume/dev index for private packages -- Metrics collection via textfile exporter diff --git a/docs/zk/1768506761-GHUW.md b/docs/zk/1768506761-GHUW.md deleted file mode 100644 index 8288840..0000000 --- a/docs/zk/1768506761-GHUW.md +++ /dev/null @@ -1,164 +0,0 @@ ---- -id: 1768506761-GHUW -tags: -- blumeops ---- - -# Grafana Alloy Management Log - -Grafana Alloy is a unified observability collector with two deployments: -1. **Indri (host)** - System metrics and service logs from macOS host -2. **Kubernetes (DaemonSet)** - Automatic pod log collection and service health probes - -## Service Details - -- Binary: `~/.local/bin/alloy` (built from source with CGO_ENABLED=1) -- Config: `~/.config/grafana-alloy/config.alloy` -- Data: `~/.local/share/grafana-alloy/` -- Logs: `~/Library/Logs/mcquack.alloy.{out,err}.log` -- Managed via: mcquack LaunchAgent (`mcquack.eblume.alloy`) - -**Why built from source?** The Homebrew bottle is built with `CGO_ENABLED=0`, which uses Go's pure DNS resolver. This resolver reads `/etc/resolv.conf` directly and ignores macOS `/etc/resolver/*` files, breaking Tailscale MagicDNS hostname resolution. Building with `CGO_ENABLED=1` uses the macOS native resolver. - -## What Alloy Collects - -### Metrics -- System metrics via `prometheus.exporter.unix` (same metrics as node_exporter) -- Textfile collector reads from `/opt/homebrew/var/node_exporter/textfile/` - - `minikube.prom` - Minikube cluster status - - `borgmatic.prom` - Backup status metrics - - `zot.prom` - Container registry metrics - - `jellyfin.prom` - Jellyfin media server metrics -- Zot registry metrics scraped from `http://localhost:5050/metrics` -- Metrics pushed to Prometheus (k8s) via remote_write at `https://prometheus.tail8d86e.ts.net/api/v1/write` - -### Logs -Collects logs from all services on Indri: - -**Brew services:** -- forgejo -- tailscale - -**mcquack LaunchAgents:** -- alloy (stdout/stderr) -- borgmatic (stdout/stderr) -- zot (stdout/stderr) -- jellyfin (stdout/stderr) - -Logs pushed to Loki (k8s) at `https://loki.tail8d86e.ts.net/loki/api/v1/push`. - -## Useful Commands - -```bash -# Check service status -ssh indri 'launchctl list | grep alloy' - -# View alloy logs -ssh indri 'tail -f ~/Library/Logs/mcquack.alloy.err.log' - -# Restart service -ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist && launchctl load ~/Library/LaunchAgents/mcquack.eblume.alloy.plist' -``` - -## Building from Source - -Alloy must be built with CGO to use macOS native DNS resolver (required for Tailscale MagicDNS): - -```bash -# On gilbert (dev workstation): -git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy -cd ~/code/3rd/alloy && mise use go@1.25 node yarn -mise x -- make alloy -scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy -``` - -Then run ansible to deploy the config and LaunchAgent. - -## Ansible Management (Indri) - -Alloy on Indri is managed via ansible in [[1767747119-YCPO|blumeops]]. - -```bash -mise run provision-indri -- --tags alloy -``` - -## Kubernetes Alloy (alloy-k8s) - -A separate Alloy DaemonSet runs in k8s for: -- **Automatic pod log collection** - discovers and collects logs from all pods -- **Service health probes** - HTTP blackbox probes for k8s services - -### Service Details (k8s) - -- Namespace: `alloy` -- Image: `grafana/alloy:v1.8.2` -- ArgoCD app: `alloy-k8s` -- Manifests: `argocd/manifests/alloy-k8s/` - -### What k8s Alloy Collects - -**Pod logs (automatic discovery):** -- All pods in all namespaces via `loki.source.kubernetes` -- Labels: namespace, pod, container, node - -**Service health probes:** -- miniflux, kiwix, transmission, devpi, argocd -- Metrics: `probe_success`, `probe_duration_seconds` -- Labels: `job="integrations/blackbox/"` - -### Useful Commands (k8s Alloy) - -```bash -# View alloy-k8s logs -kubectl --context=minikube-indri -n alloy logs -f daemonset/alloy - -# Check running config -kubectl --context=minikube-indri -n alloy get configmap alloy-config -o yaml - -# Sync from ArgoCD -argocd app sync alloy-k8s -``` - -## Log - -### Wed Jan 22 2026 (later) - -- **Added Alloy k8s DaemonSet** for automatic pod log collection -- Logs from all k8s pods now forwarded to Loki with automatic discovery -- Added service health probes for miniflux, kiwix, transmission, devpi, argocd -- New "Services Health" Grafana dashboard shows probe metrics -- Deleted stale textfile metrics (`devpi.prom`, `transmission.prom`) from indri -- Deleted stale data directories (`/opt/homebrew/var/loki`, `/opt/homebrew/var/prometheus`) - -### Wed Jan 22 2026 - -- **Rebuilt from source with CGO_ENABLED=1** - required for Tailscale MagicDNS resolution -- Migrated from Homebrew to mcquack LaunchAgent management -- Updated remote_write to push to k8s Prometheus at `prometheus.tail8d86e.ts.net` -- Updated log push to k8s Loki at `loki.tail8d86e.ts.net` -- Removed prometheus/loki log collection (now running in k8s) -- Binary now at `~/.local/bin/alloy`, config at `~/.config/grafana-alloy/` -- Added build instructions to ansible role defaults - -### Mon Jan 20 2026 - -- Removed devpi log collection (devpi migrated to k8s) -- Removed devpi.prom textfile collection (metrics role retired) -- Removed grafana log collection (grafana migrated to k8s in P2) - -### Wed Jan 15 2026 - -- Initial setup replacing node_exporter -- Configured metrics push via remote_write to Prometheus -- Configured log collection for all services, forwarding to Loki - -### Thu Jan 30 2026 - -- Removed Plex log and metrics collection (replaced by Jellyfin) -- Added Jellyfin log collection via mcquack LaunchAgent logs -- Added jellyfin.prom textfile metrics - -### Wed Jan 15 2026 (later) - -- Added Plex Media Server log collection (removed 2026-01-30) -- Added plex.prom metrics from plex_metrics role (removed 2026-01-30) diff --git a/docs/zk/1768506761-XGYX.md b/docs/zk/1768506761-XGYX.md deleted file mode 100644 index 763ddd4..0000000 --- a/docs/zk/1768506761-XGYX.md +++ /dev/null @@ -1,80 +0,0 @@ ---- -id: 1768506761-XGYX -tags: -- blumeops ---- - -# Loki Management Log - -Loki is a log aggregation system running in Kubernetes (minikube on indri), providing log storage and querying for the [[1767747119-YCPO|blumeops]] infrastructure. - -## Service Details - -- URL: https://loki.tail8d86e.ts.net -- Namespace: `monitoring` -- Image: `grafana/loki:3.4.2` -- ArgoCD app: `loki` -- Storage: 50Gi PVC -- Retention: 31 days - -## Architecture - -- Single-node deployment with filesystem storage -- TSDB index with 24h period -- Logs collected by [[alloy|Grafana Alloy]] and pushed via Loki API -- Queried via Grafana using the Loki datasource - -## Useful Commands - -```bash -# View logs -kubectl --context=minikube-indri -n monitoring logs -f loki-0 - -# Check if Loki is ready -curl -s https://loki.tail8d86e.ts.net/ready - -# Sync from ArgoCD -argocd app sync loki -``` - -## Grafana Integration - -Loki is configured as a datasource in Grafana. To explore logs: - -1. Go to https://grafana.tail8d86e.ts.net/explore -2. Select "Loki" datasource -3. Use LogQL queries: - - `{service="forgejo"}` - all forgejo logs - - `{service="borgmatic", stream="stderr"}` - borgmatic errors - - `{host="indri"} |= "error"` - all logs containing "error" - -## ArgoCD Management - -Loki is deployed via ArgoCD from `argocd/manifests/loki/`: -- `statefulset.yaml` - StatefulSet with 50Gi PVC -- `configmap.yaml` - Loki configuration -- `service.yaml` - ClusterIP service -- `ingress-tailscale.yaml` - Tailscale Ingress - -## Log - -### Thu Jan 23 2026 - -- Suppressed noisy `v1 Endpoints is deprecated` warning from minikube storage-provisioner ([upstream issue](https://github.com/kubernetes/minikube/issues/21009)) -- Added JSON field extraction for zot compatibility (`message` vs `msg`) -- Removed logfmt parsing stage - `stage.match` selectors don't prevent Alloy from logging internal decode errors, and most structured logs use JSON anyway -- Fixed devpi dashboard JSON escaping - -### Wed Jan 22 2026 - -- **Migrated to Kubernetes** - moved from Homebrew on indri to k8s StatefulSet -- Exposed via Tailscale Ingress at `loki.tail8d86e.ts.net` -- Alloy updated to push logs to k8s endpoint -- Retired ansible loki role from indri - -### Wed Jan 15 2026 - -- Initial setup with single-node filesystem storage -- Configured 31-day retention with compactor -- Integrated with Grafana as datasource -- Logs collected via Alloy from all services diff --git a/docs/zk/argocd-log.md b/docs/zk/argocd-log.md deleted file mode 100644 index 35c2cf4..0000000 --- a/docs/zk/argocd-log.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -id: argocd-log -tags: -- blumeops ---- - -# ArgoCD Management Log - -ArgoCD provides GitOps continuous delivery for the [[minikube]] cluster on Indri. - -## Service Details - -- URL: https://argocd.tail8d86e.ts.net -- Namespace: `argocd` -- Git source: `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git` -- Manifests path: `argocd/` - -## Sync Policy Decision - -**Choice**: Manual sync for workload apps, auto-sync only for app-of-apps. - -**Rationale** (decided 2026-01-19 during Phase 1 migration): -- During migration, we want explicit control over what gets deployed -- Auto-sync could deploy broken changes while we're still learning the stack -- The app-of-apps (`apps`) auto-syncs so new Application manifests appear automatically -- But those Applications have manual sync, so actual workload changes require `argocd app sync ` - -**Pattern**: -| Application | Sync Policy | Why | -|-------------|-------------|-----| -| `apps` | Automated | Picks up new Application manifests from git | -| `argocd` | Manual | Self-management changes should be deliberate | -| `tailscale-operator` | Manual | Infrastructure changes need review | -| `cloudnative-pg` | Manual | Operator upgrades need care | -| `blumeops-pg` | Manual | Database changes are sensitive | -| `grafana` | Manual | Observability stack changes need review | -| `grafana-config` | Manual | Dashboard changes should be deliberate | -| `miniflux` | Manual | Application changes need review | -| `devpi` | Manual | PyPI proxy changes need review | - -**Future consideration**: After migration stabilizes, consider enabling auto-sync for stable workloads. Keep manual sync for infrastructure (operators, databases). - -## CLI Access - -```bash -# Login (uses Tailscale for network, prompts for password) -argocd login argocd.tail8d86e.ts.net --grpc-web - -# List apps -argocd app list - -# Sync an app -argocd app sync - -# Check diff before sync -argocd app diff - -# Get app details -argocd app get -``` - -## Applications - -| App | Path | Description | -|-----|------|-------------| -| `apps` | `argocd/apps/` | App-of-apps root | -| `argocd` | `argocd/manifests/argocd/` | ArgoCD self-management | -| `tailscale-operator` | `argocd/manifests/tailscale-operator/` | Tailscale k8s operator | -| `cloudnative-pg` | Helm chart (forge mirror) | PostgreSQL operator | -| `blumeops-pg` | `argocd/manifests/databases/` | PostgreSQL cluster | -| `prometheus` | `argocd/manifests/prometheus/` | Metrics storage | -| `loki` | `argocd/manifests/loki/` | Log aggregation | -| `grafana` | Helm chart (forge mirror) | Grafana dashboards | -| `grafana-config` | `argocd/manifests/grafana-config/` | Grafana ingress & dashboards | -| `alloy-k8s` | `argocd/manifests/alloy-k8s/` | Pod log collection & service probes | -| `kube-state-metrics` | `argocd/manifests/kube-state-metrics/` | K8s resource metrics | -| `miniflux` | `argocd/manifests/miniflux/` | RSS feed reader | -| `devpi` | `argocd/manifests/devpi/` | PyPI caching proxy | -| `torrent` | `argocd/manifests/torrent/` | BitTorrent daemon | -| `kiwix` | `argocd/manifests/kiwix/` | Offline Wikipedia & ZIM archives | -| `forgejo-runner` | `argocd/manifests/forgejo-runner/` | Forgejo Actions CI runner (host mode) | - -## Credentials - -- Admin password stored in 1Password (updated from initial auto-generated password) -- Git access via deploy key (SSH) stored in 1Password - -## Log - -### 2026-01-23 (CI/CD Bootstrap Phase 1) -- Added `forgejo-runner` - Forgejo Actions CI runner -- Runner uses host mode (jobs run directly in runner container, no Docker needed) -- Labels: `ubuntu-latest`, `ubuntu-22.04` -- Note: Stock runner lacks Node.js, so `actions/checkout@v4` doesn't work - use git clone instead -- See [[forgejo]] for runner token management and workflow examples - -### 2026-01-22 (Observability Cleanup) -- Added `alloy-k8s` - DaemonSet for automatic pod log collection and service health probes -- Added `kube-state-metrics` - provides k8s resource metrics (pod counts, resource requests, etc.) -- Enhanced Minikube dashboard with namespace filtering and resource usage panels -- Added "Services Health" dashboard with probe metrics for all k8s services -- Fixed macOS dashboard instance variable to only show macOS hosts -- Cleaned up stale data: removed old textfile metrics and directories from indri -- Removed stale `/opt/homebrew/var/loki` from borgmatic backup sources - -### 2026-01-22 (Phase 7) -- **Migrated Prometheus and Loki to k8s** - completed observability stack migration -- Both now running as StatefulSets with 50Gi PVCs -- Exposed via Tailscale Ingress at `prometheus.tail8d86e.ts.net` and `loki.tail8d86e.ts.net` -- Grafana datasources updated to use k8s-internal service URLs -- Alloy rebuilt with CGO for Tailscale DNS resolution, pushes to k8s endpoints -- Retired ansible prometheus and loki roles from indri - -### 2026-01-21 (Phase 6) -- Added torrent (Transmission BitTorrent) to k8s -- Added kiwix (offline Wikipedia & ZIM archives) to k8s -- NFS storage from sifaka for shared torrent/ZIM data - -### 2026-01-20 (Phase 5) -- Added devpi (PyPI caching proxy) to k8s -- Custom container image in zot registry with devpi-server + devpi-web -- StatefulSet with 50Gi PVC for data persistence -- Changed `apps` Application to manual sync (was auto-sync with prune) - -### 2026-01-19 (Phase 2) -- Migrated Grafana from Homebrew/Ansible to Kubernetes -- Helm chart repos now mirrored to forge (cloudnative-pg-charts, grafana-helm-charts) -- SSH credential template (`repo-creds-forge`) for all forge repos -- Added indri SSH host key to ArgoCD known_hosts -- Tailscale service cutover: deleted old svc:grafana from Tailscale admin to free hostname -- Retired ansible grafana role - -### 2026-01-19 (Phase 1) -- Completed Phase 1 deployment -- Decided on manual sync policy for workloads -- Using internal [[forgejo]] as git source (not GitHub mirror) -- Exposed via Tailscale Ingress with Let's Encrypt TLS diff --git a/docs/zk/borgmatic-log.md b/docs/zk/borgmatic-log.md deleted file mode 100644 index 4edbb90..0000000 --- a/docs/zk/borgmatic-log.md +++ /dev/null @@ -1,173 +0,0 @@ ---- -id: borgmatic-log -tags: -- blumeops ---- - -# Borgmatic Management Log - -Borgmatic runs daily backups from Indri to Sifaka NAS using Borg backup. - -## Service Details - -- Installed via: mise (pipx) -- Config: `~/.config/borgmatic/config.yaml` (ansible-managed) -- Schedule: Daily at 2:00 AM via LaunchAgent -- Repository: `/Volumes/backups/borg/` on Sifaka - -## What Gets Backed Up - -**Directories:** -- `~/code/personal/zk` - Zettelkasten (primary) -- `/opt/homebrew/var/forgejo` - Git forge data -- `~/.config/borgmatic` - Borgmatic config itself -- `~/Documents` - Personal documents -- `~/Pictures` - Photos (see note below) - -**Note on iCloud Photos:** macOS Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally. Borgmatic only backs up what's on disk, so iCloud-only photos are NOT backed up. If you need full photo backups via borgmatic, either disable "Optimize Mac Storage" in Photos preferences, or use a tool like osxphotos which forces downloads. See log entry 2026-01-28. - -**Databases:** -- `miniflux` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me) -- `teslamate` PostgreSQL database on k8s CloudNativePG cluster (pg.ops.eblu.me) - -**Not backed up (by design):** -- ZIM archives in `~/transmission/` - re-downloadable via torrent -- Prometheus metrics - ephemeral data -- Loki logs - ephemeral (now in k8s PVC) -- devpi data - in k8s PVC, backup strategy TBD - -## PostgreSQL Backup - -Borgmatic uses native `postgresql_databases` support to stream `pg_dump` directly to Borg: -- No intermediate files needed -- Database keeps running (no downtime) -- Consistent transactional snapshots -- Uses `borgmatic` user with `pg_read_all_data` role -- Password read from `~/.pgpass` (managed by borgmatic ansible role) -- Uses explicit `pg_dump_command` path (`/opt/homebrew/opt/postgresql@18/bin/pg_dump`) since LaunchAgent doesn't have homebrew in PATH -- Uses explicit `local_path` (`/opt/homebrew/bin/borg`) for same reason - -**Databases backed up:** -- `pg.ops.eblu.me:5432/miniflux` - CloudNativePG cluster in k8s -- `pg.ops.eblu.me:5432/teslamate` - CloudNativePG cluster in k8s - -## Ansible Management - -Borgmatic is fully managed via ansible in [[1767747119-YCPO|blumeops]]: - -```bash -mise run provision-indri -- --tags borgmatic -``` - -The role deploys: -- `~/.config/borgmatic/config.yaml` - Main configuration -- LaunchAgent plist for scheduled runs - -## Useful Commands - -```bash -# List archives -ssh indri 'mise x -- borgmatic list' - -# Extract from latest archive -ssh indri 'mise x -- borgmatic extract --archive latest --path /some/path' - -# Run backup manually -ssh indri 'mise x -- borgmatic create --verbosity 1' - -# Check repository health -ssh indri 'mise x -- borgmatic check' -``` - -## Retention Policy - -- 7 daily backups -- 12 monthly backups -- 1000 yearly backups (effectively forever) - -## Monitoring - -Borgmatic metrics are collected hourly via a script at `~/bin/borgmatic-metrics` and exposed to Prometheus via the node_exporter textfile collector. - -View the Grafana dashboard at: https://grafana.tail8d86e.ts.net (select "Borgmatic Backups" dashboard) - -Metrics include: -- `borgmatic_up` - repository accessibility -- `borgmatic_repo_deduplicated_size_bytes` - actual disk usage -- `borgmatic_last_archive_original_size_bytes` - size of data being backed up -- `borgmatic_last_archive_deduplicated_size_bytes` - new data added per backup -- `borgmatic_archive_count` - number of archives -- `borgmatic_last_archive_timestamp` - when last backup ran - -```bash -# Check metrics file -ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom' - -# Check metrics LaunchAgent status -ssh indri 'launchctl list | grep borgmatic-metrics' -``` - -## Log - -### Tue Jan 28 2026 - -- Investigated massive backup size increase (~69GB deduplicated, ~94GB per archive) -- Root cause: immich-sync role (added Jan 26, removed Jan 28) used osxphotos to export photos -- **Lesson learned:** osxphotos forces Photos.app to download all iCloud originals locally -- Photos.app defaults to "Optimize Mac Storage" which keeps only thumbnails locally -- Before immich-sync: borgmatic was backing up thumbnails (~few GB) -- After immich-sync: borgmatic now has full 42GB of photo originals -- This is actually a bonus - provides redundant photo backup alongside iCloud and Immich -- Retention policy means these photos will be kept in yearly archives essentially forever -- **Future plan:** Once Immich (on sifaka "photos" volume with Synology offsite backup) is fully set up, Pictures may be removed from borgmatic as redundant - -### Thu Jan 23 2026 - -- Note: Forgejo `app.ini` is now managed by ansible (secrets in 1Password) -- `/opt/homebrew/var/forgejo` still backed up for git repositories and data -- But `app.ini` recovery no longer depends on borgmatic (can be regenerated via ansible) - -### Wed Jan 22 2026 - -- Removed `/opt/homebrew/var/loki` from backup sources (stale data from pre-k8s migration) -- Loki now runs in k8s with ephemeral storage - logs are not backed up by design -- Verified backup integrity after cleanup - -### Mon Jan 20 2026 (P5) - -- Removed `~/devpi` from backup sources (devpi migrated to k8s) -- devpi data now in k8s PVC - backup strategy TBD - -### Sun Jan 19 2026 (P4) - -- Removed localhost PostgreSQL backup (brew pg retired) -- Updated to backup only `pg.tail8d86e.ts.net` (k8s CloudNativePG) -- Moved .pgpass management from postgresql role to borgmatic role - -### Sun Jan 19 2026 (P3) - -- Fixed borgmatic failing to find `borg` binary by adding `local_path` option to config -- Added k8s-pg (CloudNativePG cluster) backup alongside brew PostgreSQL -- Added ACL grant for `tag:homelab` → `tag:k8s` on port 5432 for backup access -- Successfully tested disaster recovery: restored miniflux data from borgmatic dump to k8s-pg -- Created `borgmatic` user in k8s-pg via CloudNativePG managed roles -- Both localhost and k8s-pg databases backed up during migration period - -### Sat Jan 18 2026 - -- Fixed borgmatic-metrics script failing in LaunchAgent context by using absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`) instead of `mise x -- borg` -- This was causing the Grafana dashboard to show "Repository Status: DOWN" and missing time series data - -### Fri Jan 17 2026 - -- Fixed PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found") -- Removed `~/code/3rd/kiwix-tools` from backups (was just symlinks, ZIM archives are re-downloadable) -- Enabled Loki log backup (removed from exclude_patterns) -- Added borgmatic_metrics role for Prometheus metrics collection -- Added Grafana dashboard for backup monitoring (size trends, dedup ratio, time since last backup) - -### Thu Jan 16 2026 - -- Moved config from manual management to ansible-managed template -- Added `postgresql_databases` backup for miniflux database -- Config now deployed via `ansible/roles/borgmatic/templates/config.yaml.j2` diff --git a/docs/zk/external-secrets-log.md b/docs/zk/external-secrets-log.md deleted file mode 100644 index dc50f3f..0000000 --- a/docs/zk/external-secrets-log.md +++ /dev/null @@ -1,71 +0,0 @@ ---- -id: external-secrets-log -tags: -- blumeops ---- - -# External Secrets Operator - -External Secrets Operator (ESO) syncs secrets from 1Password to Kubernetes Secrets via 1Password Connect. - -## Architecture - -``` -1Password Cloud - | - v -1Password Connect (namespace: 1password) - | - v -External Secrets Operator (namespace: external-secrets) - | - v -Native Kubernetes Secrets -``` - -## Usage - -ClusterSecretStore `onepassword-blumeops` provides access to the blumeops vault. See `argocd/manifests/devpi/external-secret.yaml` for a simple example. - -**Important:** 1Password Connect doesn't support the `?ssh-format=openssh` query parameter. SSH keys must be stored as Secure Notes with the OpenSSH-formatted key (see `argocd-forge-ssh-key` item). - -```bash -# Check all ExternalSecrets -kubectl --context=minikube-indri get externalsecret -A - -# Find 1Password field names -op item get --vault blumeops --format json | jq '.fields[] | .label' -``` - -## Bootstrap (One-Time Setup) - -If reinstalling from scratch: - -1. Create Connect server credentials: - ```bash - op connect server create blumeops --vaults blumeops - op connect token create blumeops --server --vault blumeops - ``` - -2. Store in 1Password item "1Password Connect": - - `credentials-file`: raw JSON - - `credentials-base64`: base64-encoded JSON - - `token`: access token - -3. Apply bootstrap secret: - ```bash - kubectl --context=minikube-indri create namespace 1password - op inject -i argocd/manifests/1password-connect/secret-credentials.yaml.tpl | \ - kubectl --context=minikube-indri apply -f - - ``` - -4. Sync apps in order: - - `argocd app sync 1password-connect` - - `argocd app sync external-secrets-crds` - - `argocd app sync external-secrets` - - `argocd app sync external-secrets-config` - -## Related - -- [[1767747119-YCPO|BlumeOps]] -- [[argocd|ArgoCD]] diff --git a/docs/zk/grafana-log.md b/docs/zk/grafana-log.md deleted file mode 100644 index c01402a..0000000 --- a/docs/zk/grafana-log.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -id: grafana-log -tags: - - blumeops ---- - -# Grafana Management Log - -Grafana provides dashboards and observability for [[blumeops]]. - -## Service Details - -- URL: https://grafana.ops.eblu.me (also https://grafana.tail8d86e.ts.net) -- Namespace: `monitoring` -- Helm chart: grafana (mirrored to forge) -- Values: `argocd/manifests/grafana/values.yaml` -- Dashboards: `argocd/manifests/grafana-config/dashboards/` - -## Embedding Note - -Grafana panel embedding via iframes was attempted for Homepage but didn't work well: -- Homepage's iframe widget doesn't support width constraints (only height) -- Grafana's "Public Dashboards" feature doesn't support template variables or PostgreSQL datasources -- Anonymous auth would be required, which exposes all dashboards - -Current config has `allow_embedding: false`. If revisiting this, see git history for the iframe attempt (2026-01-30). - -## Datasources - -| Name | Type | URL | -|------|------|-----| -| Prometheus | prometheus | `http://prometheus.monitoring.svc.cluster.local:9090` | -| Loki | loki | `http://loki.monitoring.svc.cluster.local:3100` | -| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` | - -## Dashboard Provisioning - -Dashboards are provisioned via ConfigMaps with label `grafana_dashboard: "1"`. The sidecar watches for these and loads them automatically. - -To add a dashboard: -1. Create ConfigMap in `argocd/manifests/grafana-config/dashboards/` -2. Add label `grafana_dashboard: "1"` -3. Optionally add annotation `grafana_folder: "FolderName"` for organization -4. Sync the `grafana-config` ArgoCD app - -## Log - -### 2026-01-30 -- Attempted Grafana iframe embeds for Homepage metrics panel -- Issues: width constraints don't work, some panels fail to load -- Reverted to authenticated-only access (no anonymous auth) - -### 2026-01-19 (Phase 2) -- Migrated from Homebrew/Ansible to Kubernetes -- Helm chart mirrored to forge -- Exposed via Tailscale Ingress diff --git a/docs/zk/indri-log.md b/docs/zk/indri-log.md deleted file mode 100644 index 1be8846..0000000 --- a/docs/zk/indri-log.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -id: indri-log -tags: -- blumeops ---- - -# Indri Maintenance Log - -Indri is a Mac Mini M1 (2020) serving as the primary [[1767747119-YCPO|BlumeOps]] server. - -## Host Details - -- Model: Mac mini M1, 2020 (Macmini9,1) -- Storage: 2TB internal SSD -- macOS: 15.7.3 (Sequoia) -- Role: Primary server for homelab services - -## Passwordless Sudo - -Configured passwordless sudo for `erichblume` user to allow ansible `become: true` tasks to run without password prompts: - -```bash -# Config at /etc/sudoers.d/erichblume -erichblume ALL=(ALL) NOPASSWD: ALL -``` - -This is acceptable given the security model - tailnet access is the trust boundary. - -## Sleep Prevention - -Indri must stay awake to serve network requests. Currently using **Amphetamine** (App Store) to prevent sleep. - -**Configuration:** -- Start Session At Launch: enabled -- Default Duration: indefinite -- Allow Closed-Display Sleep: enabled (no display attached) - -**Known Issue:** Amphetamine can crash after extended uptime (~12 days observed), leaving the system unprotected. If this becomes a recurring problem, consider switching to system-level sleep prevention: - -```bash -# Option 1: Disable sleep via pmset (requires sudo) -sudo pmset -c sleep 0 displaysleep 0 - -# Option 2: Use caffeinate daemon via LaunchAgent -# Create ~/Library/LaunchAgents/com.local.caffeinate.plist -caffeinate -s # -s = prevent sleep on AC power -``` - -These could be managed via ansible for reliability. - -## Log - -### Mon Jan 20 2026 - -**Amphetamine crash caused overnight sleep** - -- Amphetamine 5.3.2 crashed at 19:08 on Jan 19 (segfault in `objc_release` during timer callback) -- System went to sleep at 19:20, stayed asleep overnight -- Discovered when services were unreachable; manually restarted Amphetamine at ~07:30 -- Crash report: `~/Library/Logs/DiagnosticReports/Amphetamine-2026-01-19-190921.ips` -- Root cause: Memory management bug in Amphetamine during long-running session (~12 days uptime) -- Action: Monitoring for now; if recurs, will implement `pmset`/`caffeinate` via ansible diff --git a/docs/zk/jellyfin-log.md b/docs/zk/jellyfin-log.md deleted file mode 100644 index f3e9899..0000000 --- a/docs/zk/jellyfin-log.md +++ /dev/null @@ -1,88 +0,0 @@ ---- -id: jellyfin-log -tags: - - blumeops ---- - -# Jellyfin Management Log - -Jellyfin is a free, open-source media server running natively on [[indri|Indri]] for full VideoToolbox hardware transcoding support. - -## Service Details - -- URL: https://jellyfin.ops.eblu.me -- Port: 8096 (localhost only, proxied via Caddy) -- Data directory: `~/Library/Application Support/jellyfin` -- Media path: `/Volumes/allisonflix` (NFS from sifaka) -- LaunchAgent: `mcquack.jellyfin` - -## Useful Commands - -```bash -# Check LaunchAgent status -ssh indri 'launchctl list | grep jellyfin' - -# View logs -ssh indri 'tail -f ~/Library/Logs/mcquack.jellyfin.err.log' - -# Check port is listening -ssh indri 'lsof -nP -iTCP:8096 -sTCP:LISTEN' - -# Restart Jellyfin -ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist && launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist' - -# Check metrics -ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/jellyfin.prom' -``` - -## Hardware Transcoding - -Jellyfin uses Apple VideoToolbox for hardware-accelerated transcoding on the M1 Mac Mini. - -**Capabilities:** -- H.264 encode/decode: Hardware -- HEVC (H.265) encode/decode: Hardware -- AV1 decode: Software only (requires M3+) -- HDR to SDR tone mapping: VPP (hardware) -- Concurrent 4K streams: ~3 with HDR tonemapping - -**Configuration** (Dashboard > Playback): -1. Hardware Acceleration: Apple VideoToolbox -2. Allow hardware encoding: Enabled -3. VPP Tone mapping: Enabled (for HDR to SDR) - -## Observability - -- Metrics: Collected via `jellyfin_metrics` ansible role to Prometheus textfile -- Logs: Forwarded to Loki via Alloy (`service="jellyfin"`) -- Dashboard: "Jellyfin Media Server" in Grafana - -### Metrics collected: -- `jellyfin_up` - Server availability -- `jellyfin_version_info` - Server version -- `jellyfin_library_items{library,type}` - Library counts -- `jellyfin_sessions_total` - Active sessions -- `jellyfin_sessions_playing` - Playing sessions -- `jellyfin_transcode_sessions_total` - Transcoding sessions - -## API Key Setup - -Metrics collection requires an API key: - -1. Open https://jellyfin.ops.eblu.me -2. Go to Dashboard > API Keys > Add -3. Create key with description "metrics" -4. Save to indri: -```bash -ssh indri 'echo "YOUR_API_KEY" > ~/.jellyfin-api-key && chmod 600 ~/.jellyfin-api-key' -``` - -## Log - -### 2026-01-30 (Initial Deployment) -- Deployed Jellyfin natively on indri via Ansible -- Installed via Homebrew cask, managed via LaunchAgent -- Added Caddy routing for `jellyfin.ops.eblu.me` -- Added metrics collection (jellyfin_metrics role) -- Added log collection via Alloy -- Created Grafana dashboard diff --git a/docs/zk/kiwix-log.md b/docs/zk/kiwix-log.md deleted file mode 100644 index 44ef9de..0000000 --- a/docs/zk/kiwix-log.md +++ /dev/null @@ -1,101 +0,0 @@ ---- -id: kiwix-log -tags: - - blumeops ---- - -# Kiwix Management Log - -Kiwix serves offline Wikipedia (and other ZIM archives) in Kubernetes via Tailscale at https://kiwix.tail8d86e.ts.net. - -## Service Details - -- URL: https://kiwix.tail8d86e.ts.net -- Namespace: `kiwix` -- Image: `ghcr.io/kiwix/kiwix-serve:3.8.1` -- ArgoCD app: `kiwix` -- Storage: NFS mount from sifaka (`/volume1/torrents`) - -## Architecture - -The kiwix deployment has two components: - -1. **kiwix-serve** - Main container serving ZIM files at port 80 -2. **torrent-sync** - Sidecar that syncs declarative ZIM torrent list to Transmission - -A CronJob (`zim-watcher`) runs hourly to detect new ZIM files and trigger a deployment restart when needed. - -## Useful Commands - -```bash -# View kiwix logs -kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c kiwix-serve - -# View torrent sync logs -kubectl --context=minikube-indri -n kiwix logs -f deployment/kiwix -c torrent-sync - -# Check ZIM watcher job -kubectl --context=minikube-indri -n kiwix get cronjob zim-watcher - -# Manually trigger ZIM watcher -kubectl --context=minikube-indri -n kiwix create job --from=cronjob/zim-watcher zim-watcher-manual - -# Sync from ArgoCD -argocd app sync kiwix -``` - -## ArgoCD Management - -Kiwix is deployed via ArgoCD from `argocd/manifests/kiwix/`: -- `deployment.yaml` - Kiwix-serve + torrent-sync sidecar -- `service.yaml` - ClusterIP service -- `ingress-tailscale.yaml` - Tailscale Ingress -- `configmap-zim-torrents.yaml` - Declarative list of ZIM torrents to download -- `configmap-sync-script.yaml` - Script to sync torrents to Transmission -- `cronjob-zim-watcher.yaml` - Hourly job to restart kiwix on new ZIMs - -## Adding New ZIM Archives - -1. Edit `argocd/manifests/kiwix/configmap-zim-torrents.yaml` -2. Add the torrent URL from https://download.kiwix.org/zim/ -3. Sync the kiwix app: `argocd app sync kiwix` -4. The torrent-sync sidecar will add the torrent to [[transmission|Transmission]] -5. Once downloaded, the zim-watcher CronJob will detect it and restart kiwix - -## Configured Archives - -The declarative torrent list includes: -- Wikipedia top 1M English articles with images -- Project Gutenberg (60,000+ public domain books) -- iFixit repair guides -- Stack Exchange sites (SuperUser, Math, etc.) -- LibreTexts textbooks (Bio, Chem, Eng, Math, Phys, Humanities) -- DevDocs (developer documentation bundles) - -See `argocd/manifests/kiwix/configmap-zim-torrents.yaml` for the full list. - -## Storage - -ZIM files are stored on sifaka NAS at `/volume1/torrents/complete/`. The kiwix pod mounts this directory via NFS. - -**Note**: The NFS mount works because minikube uses the docker driver which NATs through indri's LAN IP, allowing direct access to sifaka. - -## Log - -### 2026-01-21 (P6) -- **Migrated to Kubernetes** (Phase 6 of k8s migration) -- Direct NFS mount from sifaka (no PVC, shared with transmission) -- Torrent-sync sidecar adds configured ZIMs to Transmission -- ZIM-watcher CronJob restarts deployment when new files appear -- Tailscale Ingress at `kiwix.tail8d86e.ts.net` -- Retired ansible kiwix role from indri - -### 2026-01-14 -- Added transmission integration for background torrent downloads -- Enabled Gutenberg, iFixit, SuperUser, Math SE, and all LibreTexts archives - -### 2026-01-13 -- Added kiwix role to ansible playbook -- Operationalized ZIM archive downloads with configurable list -- Initial setup with kiwix-tools binary on indri -- Managed via LaunchAgent on port 5501 diff --git a/docs/zk/miniflux-log.md b/docs/zk/miniflux-log.md deleted file mode 100644 index 526c6f3..0000000 --- a/docs/zk/miniflux-log.md +++ /dev/null @@ -1,79 +0,0 @@ ---- -id: miniflux-log -tags: -- blumeops ---- - -# Miniflux Management Log - -Miniflux is a minimalist RSS/Atom feed reader running in Kubernetes (minikube on indri). - -## Service Details - -- URL: https://feed.tail8d86e.ts.net -- Namespace: miniflux -- Image: ghcr.io/miniflux/miniflux:latest -- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net) -- ArgoCD app: miniflux - -## Useful Commands - -```bash -# View logs -kubectl -n miniflux logs -f deployment/miniflux - -# Restart deployment -kubectl -n miniflux rollout restart deployment/miniflux - -# Check health -curl https://feed.tail8d86e.ts.net/healthcheck - -# Sync from ArgoCD -argocd app sync miniflux -``` - -## ArgoCD Management - -Miniflux is deployed via ArgoCD from `argocd/manifests/miniflux/`: -- `deployment.yaml` - Deployment with environment configuration -- `service.yaml` - ClusterIP service -- `ingress-tailscale.yaml` - Tailscale Ingress for external access - -## Credentials - -The miniflux database user password is auto-generated by CloudNativePG and stored in the `blumeops-pg-app` secret in the databases namespace. - -To recreate the miniflux-db secret: -```bash -kubectl create secret generic miniflux-db -n miniflux \ - --from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)" -``` - -## Features - -- Keyboard shortcuts for efficient reading -- Fever and Google Reader API compatible -- Mobile-friendly web interface -- OPML import/export -- Content scraping for full articles - -## Backup - -Feed subscriptions and read state stored in [[postgresql]], backed up via borgmatic's postgresql_databases hook. - -## Log - -### Sun Jan 19 2026 - -- **Migrated to Kubernetes** (Phase 4 of k8s migration) -- Deployed via ArgoCD in `miniflux` namespace -- Database connection via internal k8s DNS to CloudNativePG cluster -- Exposed via Tailscale Ingress at feed.tail8d86e.ts.net -- Removed brew miniflux service and ansible role from indri -- Fixed table ownership issue after P3 restore (tables were owned by eblume, needed to be owned by miniflux) - -### Thu Jan 16 2026 - -- Initial setup with Miniflux 2.x on brew -- Connected to PostgreSQL 18 on localhost -- Exposed via Tailscale at feed.tail8d86e.ts.net diff --git a/docs/zk/minikube.md b/docs/zk/minikube.md deleted file mode 100644 index 22272f0..0000000 --- a/docs/zk/minikube.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -id: minikube -tags: -- blumeops ---- - -# Minikube Management Log - -Minikube provides a single-node Kubernetes cluster on Indri for running containerized services. - -## Cluster Details - -- Driver: **docker** (runs as container inside Docker Desktop) -- Container runtime: docker -- Kubernetes version: v1.34.0 -- Resources: 6 CPUs, 11GB RAM (leaves 1GB for Docker Desktop overhead), 200GB disk -- API server: https://k8s.tail8d86e.ts.net (Tailscale service with TCP passthrough) -- Internal port: dynamic (currently 50820 - Docker maps random host port to container's 6443) - -**Prerequisites:** Docker Desktop must be installed and running with at least 12GB memory allocated. - -## Remote Access from Gilbert - -Run `mise run ensure-minikube-indri-kubectl-config` to set up kubectl access. This script: -1. Fetches certificates from indri via SSH -2. Creates kubeconfig at `~/.kube/minikube-indri/config.yml` - -**Fish abbreviations** (in `~/.config/fish/config.fish`): -- `ki` -> `kubectl --context=minikube-indri` -- `k9i` -> `k9s --context=minikube-indri` -- `k9` -> `k9s` - -```bash -# Quick access via abbreviations -ki get nodes -k9i - -# Or explicitly set context -kubectl config use-context minikube-indri -kubectl get nodes -``` - -## Volume Mounting (for P6 kiwix/transmission) - -**Direct NFS from pods to sifaka** - tested and working. - -Docker NATs outbound traffic through indri's LAN IP (192.168.1.50). Sifaka's NFS exports allow: -- `192.168.1.0/24` - Docker containers via indri NAT -- `100.64.0.0/10` - Tailscale clients - -Pods mount NFS directly: -```yaml -volumes: - - name: torrents - nfs: - server: sifaka - path: /volume1/torrents -``` - -No LaunchAgents, no `minikube mount`, no hostPath complexity needed. - -## Useful Commands (on indri) - -```bash -# Cluster status -minikube status - -# Start/stop cluster -minikube start -minikube stop - -# Access dashboard -minikube dashboard - -# SSH into node -minikube ssh - -# View logs -minikube logs - -# Get API server URL (shows current port) -kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" -``` - -## Registry Mirror (Zot) - -Containerd is configured to use [[zot]] on indri as a pull-through cache for container images. This is managed by the ansible `minikube` role. - -Config location: `/etc/containerd/certs.d//hosts.toml` (inside minikube container) - -With docker driver, uses `host.minikube.internal:5050` to reach zot on the host. - -Mirrors configured for: -- `registry.ops.eblu.me` (private images) -- `docker.io` -- `ghcr.io` -- `quay.io` - -To verify the mirror is working: -```bash -# Check zot's cached images -curl -s http://localhost:5050/v2/_catalog | jq -``` - -## Log - -### 2026-01-21 (Docker Driver Migration) -- **Migrated from qemu2 to docker driver** (Phase 5.1) -- qemu2 had Tailscale TCP proxy issue (TLS handshake timeout to VM IP) -- docker driver puts API server on localhost, which Tailscale serve handles correctly -- Removed socket_vmnet, qemu dependencies -- Removed NFS/minikube-mount LaunchAgents (will re-add NFS for P6 with simpler hostPath approach) -- API server port is now dynamic (Docker assigns random host port) -- Ansible role updated to query port and configure tailscale serve accordingly -- Created `mise run ensure-minikube-indri-kubectl-config` for workstation setup - -### 2026-01-21 (QEMU2 Migration - superseded) -- Migrated from podman to qemu2 driver -- Podman driver had fundamental limitations preventing volume mounts -- qemu2 created actual VM with full kernel capabilities -- Volume mounting solution: NFS on host + minikube mount passthrough -- **Issue discovered:** Tailscale TCP proxy to VM IP (192.168.105.2:6443) fails with TLS timeout - -### 2026-01-19 -- Configured CRI-O registry mirror to use zot as pull-through cache -- Added ansible automation to apply mirror config on provisioning -- Fixed ansible hanging: `minikube ssh` with piped stdin requires `--native-ssh=false` - -### 2026-01-18 -- Initial cluster setup for k8s migration Phase 0 -- Configured for remote access with --apiserver-names=indri -- 1Password credential integration for kubectl from gilbert -- Exposed as Tailscale service `k8s.tail8d86e.ts.net` with TCP passthrough diff --git a/docs/zk/navidrome-log.md b/docs/zk/navidrome-log.md deleted file mode 100644 index 45a8f3d..0000000 --- a/docs/zk/navidrome-log.md +++ /dev/null @@ -1,78 +0,0 @@ ---- -id: navidrome-log -tags: -- blumeops -- service ---- - -Navidrome is a self-hosted music streaming server deployed on [[blumeops|BlumeOps]]. - -# Access - -- **Primary URL**: https://dj.ops.eblu.me (via Caddy) -- **Tailscale URL**: https://dj.tail8d86e.ts.net - -# Deployment - -Navidrome runs in Kubernetes (minikube on [[indri]]) and is managed via [[argocd|ArgoCD]]. - -**Manifests**: `argocd/manifests/navidrome/` - -## Storage - -| Mount | Type | Source | Access | -|---------|-------------------|-------------------------|------------| -| /music | NFS PV | sifaka:/volume1/music | Read-only | -| /data | Local PVC (10Gi) | minikube storage class | Read-write | - -The `/data` directory contains: -- SQLite database -- Configuration -- Cache files - -## Configuration - -Environment variables set in deployment: -- `ND_SCANSCHEDULE=1h` - Rescan library every hour -- `ND_LOGLEVEL=info` - Standard logging level -- `ND_MUSICFOLDER=/music` - Music library path -- `ND_DATAFOLDER=/data` - Data directory path - -## Initial Setup - -On first access, Navidrome will prompt to create an admin user. No default credentials. - -# Operations - -## Sync Application - -```bash -argocd app sync navidrome -``` - -## Check Status - -```bash -argocd app get navidrome -kubectl --context=minikube-indri -n navidrome get pods -kubectl --context=minikube-indri -n navidrome logs deploy/navidrome -``` - -## Verify NFS Mount - -```bash -kubectl --context=minikube-indri -n navidrome exec deploy/navidrome -- ls /music -``` - -## Force Library Rescan - -Access Settings > Library in the web UI, or trigger via API: -```bash -curl -X POST https://dj.ops.eblu.me/api/library/scan -H "x-nd-authorization: Bearer " -``` - -# Related - -- [[jellyfin]] - Video streaming (runs on indri directly) -- [[argocd]] - GitOps deployment -- [[blumeops]] - Infrastructure overview diff --git a/docs/zk/postgresql-log.md b/docs/zk/postgresql-log.md deleted file mode 100644 index 2df2553..0000000 --- a/docs/zk/postgresql-log.md +++ /dev/null @@ -1,127 +0,0 @@ ---- -id: postgresql-log -tags: -- blumeops ---- - -# PostgreSQL Management Log - -PostgreSQL database cluster running in Kubernetes (minikube on indri) via CloudNativePG operator, providing storage for [[miniflux]] and other services. - -## Quick Connect - -```bash -# Connect as superuser (fetches password from 1Password) -PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -d miniflux -``` - -## Service Details - -- URL: tcp://pg.tail8d86e.ts.net:5432 -- Metrics: http://cnpg-metrics.tail8d86e.ts.net:9187/metrics -- Namespace: databases -- Cluster name: blumeops-pg -- Operator: CloudNativePG -- ArgoCD app: blumeops-pg - -## Databases - -| Database | Owner | Purpose | -|----------|----------|----------------------------| -| miniflux | miniflux | Miniflux feed reader data | - -## Users - -| User | Role | Purpose | -|-----------|------------------|------------------------| -| postgres | superuser | CNPG internal | -| miniflux | app owner | Owns miniflux database | -| eblume | superuser | Admin access | -| borgmatic | pg_read_all_data | Backup access | - -## Useful Commands - -```bash -# List databases -PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\l" - -# List users -PGPASSWORD=$(op --vault blumeops item get guxu3j7ajhjyey6xxl2ovsl2ui --fields password --reveal) psql -h pg.tail8d86e.ts.net -U eblume -c "\du" - -# View CNPG cluster status -kubectl -n databases get cluster blumeops-pg - -# View pod logs -kubectl -n databases logs -f blumeops-pg-1 -``` - -## Backup - -PostgreSQL data is backed up via borgmatic from indri using the `postgresql_databases` hook, which streams pg_dump directly to Borg for consistent backups. - -Borgmatic config (`~/.config/borgmatic/config.yaml`): -```yaml -postgresql_databases: - - name: miniflux - hostname: pg.tail8d86e.ts.net - port: 5432 - username: borgmatic -``` - -Password is read from `~/.pgpass` (managed by borgmatic ansible role). - -## ArgoCD Management - -```bash -# Sync cluster changes -argocd app sync blumeops-pg - -# Force reconcile -kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite -``` - -**Files:** -- Cluster spec: `argocd/manifests/databases/blumeops-pg.yaml` -- Tailscale service: `argocd/manifests/databases/service-tailscale.yaml` -- Secrets: `secret-eblume.yaml.tpl`, `secret-borgmatic.yaml.tpl` (via `op inject`) - -## Credentials - -**1Password items:** -- `guxu3j7ajhjyey6xxl2ovsl2ui` - eblume superuser password -- `mw2bv5we7woicjza7hc6s44yvy` - borgmatic user password - -**CNPG-managed secrets:** -- `blumeops-pg-app` - miniflux user (auto-generated password) -- `blumeops-pg-eblume` - eblume superuser -- `blumeops-pg-borgmatic` - borgmatic backup user - -## Log - -### Wed Jan 22 2026 - -- Added CNPG metrics collection via Tailscale service at `cnpg-metrics.tail8d86e.ts.net:9187` -- Updated PostgreSQL Grafana dashboard to use CNPG metric names (`cnpg_*` prefix) -- Prometheus on indri now scrapes CNPG metrics directly - -### Sun Jan 19 2026 (P4) - -- **Retired brew PostgreSQL** - k8s CloudNativePG is now the only PostgreSQL -- Renamed Tailscale hostname from `k8s-pg` to `pg` (canonical) -- Removed postgresql ansible role from indri -- Moved .pgpass management to borgmatic role -- Updated borgmatic to backup only `pg.tail8d86e.ts.net` -- Fixed table ownership issue: P3 restore created tables owned by eblume, transferred to miniflux - -### Sun Jan 19 2026 (P3) - -- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg -- Added borgmatic user to k8s-pg via CloudNativePG managed roles -- Both brew and k8s PostgreSQL backed up by borgmatic during migration -- Added Tailscale ACL: `tag:homelab` → `tag:k8s` on port 5432 for backup access - -### Thu Jan 16 2026 - -- Initial setup with PostgreSQL 18 (brew) -- Created miniflux database and user -- Exposed via Tailscale at pg.tail8d86e.ts.net diff --git a/docs/zk/pulumi.md b/docs/zk/pulumi.md deleted file mode 100644 index 422f83d..0000000 --- a/docs/zk/pulumi.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -id: pulumi -tags: -- blumeops ---- - -# Pulumi Tailnet IaC Management Log - -Pulumi manages the tail8d86e.ts.net tailnet configuration, including ACLs, tags, and DNS settings. - -## Architecture - -Two-layer approach: -- **Layer 1 (Pulumi)**: Tailnet-wide config - ACLs, tags, DNS (this card) -- **Layer 2 (Ansible)**: Node-local `tailscale serve` config - see `tailscale_serve` role - -## Service Details - -- State backend: Pulumi Cloud (https://app.pulumi.com/eblume/blumeops-tailnet) -- Stack: `tail8d86e` -- Config directory: `pulumi/` in blumeops repo -- Policy file: `pulumi/policy.hujson` (HuJSON with comments) - -## Authentication - -Uses OAuth client stored in 1Password (blumeops vault): -- Client configured with scopes: acl, dns, devices, services -- Auto-applies `tag:blumeops` to IaC-managed resources - -## Useful Commands - -```bash -# Preview changes -mise run tailnet-preview - -# Apply changes -mise run tailnet-up - -# View current state -mise run tailnet-preview - -# Pass additional args -mise run tailnet-up -- --yes -``` - -## Making ACL Changes - -1. Edit `pulumi/policy.hujson` in the blumeops repo -2. Run `mise run tailnet-preview` to see what will change -3. Run `mise run tailnet-up` to apply -4. Commit and push - -## What's Managed - -Currently managed by Pulumi: -- ACL policy (`tailscale:index:Acl`) - -Can be added later: -- DNS nameservers (`tailscale:index:DnsNameservers`) -- DNS search paths (`tailscale:index:DnsSearchPaths`) -- Tailnet settings (`tailscale:index:TailnetSettings`) - -## Log - -### Wed Jan 15 2026 - -- Initial setup with Pulumi + Python -- Imported existing ACL from Tailscale -- State stored in Pulumi Cloud (free tier) -- OAuth authentication via 1Password diff --git a/docs/zk/teslamate-log.md b/docs/zk/teslamate-log.md deleted file mode 100644 index 7a814dc..0000000 --- a/docs/zk/teslamate-log.md +++ /dev/null @@ -1,110 +0,0 @@ ---- -id: teslamate-log -tags: -- blumeops ---- - -# TeslaMate - -TeslaMate is a self-hosted Tesla data logger running in Kubernetes (minikube on indri), collecting and visualizing vehicle data from the Tesla Owner API. - -## Service Details - -- URL: https://tesla.tail8d86e.ts.net -- Namespace: `teslamate` -- Image: `teslamate/teslamate:2.2.0` -- Database: [[postgresql]] (CloudNativePG cluster at pg.tail8d86e.ts.net) -- ArgoCD app: `teslamate` - -## What TeslaMate Collects - -- Battery level, state of charge, range estimates -- Charging sessions (location, energy, cost, duration) -- Drives (distance, efficiency, routes) -- Climate/HVAC usage -- Software update history -- Vampire drain analysis -- Vehicle states (asleep, driving, charging, online) - -## Grafana Dashboards - -18 dashboards available in Grafana under the "TeslaMate" folder at https://grafana.tail8d86e.ts.net: - -- Overview, Charges, Drives, Efficiency, States -- Battery Health, Vampire Drain, Statistics -- Charge Level, Locations, Trip, Mileage -- Drive Stats, Charging Stats, Projected Range -- Timeline, Updates, Visited - -Dashboards use the `TeslaMate` PostgreSQL datasource (not Prometheus). - -## Useful Commands - -```bash -# View logs -kubectl --context=minikube-indri -n teslamate logs -f deployment/teslamate - -# Check pod status -kubectl --context=minikube-indri -n teslamate get pods - -# Restart deployment -kubectl --context=minikube-indri -n teslamate rollout restart deployment/teslamate - -# Sync from ArgoCD -argocd app sync teslamate -``` - -## Credentials - -**1Password items (blumeops vault):** -- `TeslaMate` - contains `db_password` and `api_enc_key` fields - -**Kubernetes secrets:** -- `teslamate-db` (teslamate ns) - DATABASE_PASS for PostgreSQL connection -- `teslamate-encryption` (teslamate ns) - ENCRYPTION_KEY for token encryption -- `blumeops-pg-teslamate` (databases ns) - CloudNativePG managed role password -- `grafana-teslamate-datasource` (monitoring ns) - Grafana datasource password - -## Backup - -TeslaMate data is backed up via [[borgmatic]]: -- PostgreSQL database `teslamate` included in `borgmatic_postgresql_databases` -- Backed up alongside miniflux to sifaka NAS - -## Tesla API Authentication - -TeslaMate uses Tesla's Owner API (not Fleet API) via OAuth: - -1. Access https://tesla.tail8d86e.ts.net -2. Click "Sign in with Tesla" -3. Complete OAuth flow in browser -4. Tokens are encrypted with ENCRYPTION_KEY and stored in database -5. TeslaMate automatically refreshes tokens as needed - -**Standalone OAuth tool:** If you need to manually obtain tokens, there's a Rust-based helper: -- Mirror: https://forge.tail8d86e.ts.net/eblume/tesla_auth.git -- Runs OAuth flow and outputs access/refresh tokens - -## Database Notes - -- TeslaMate requires PostgreSQL 17.3+ or 18.x -- The `teslamate` user has superuser privileges (required for extension management during migrations) -- Extensions used: `cube`, `earthdistance` (for geospatial calculations) - -## Related - -- [[1767747119-YCPO|BlumeOps]] -- [[argocd|ArgoCD]] -- [[postgresql|PostgreSQL]] -- [[borgmatic|Borgmatic]] - -## Log - -### Thu Jan 23 2026 - -- Initial deployment to Kubernetes -- 18 Grafana dashboards imported from TeslaMate project -- Upgraded CloudNativePG 1.25 -> 1.28 for major version upgrade support -- Upgraded PostgreSQL 17.2 -> 18.1 (required for TeslaMate 2.2.0) -- Tailscale Ingress at `tesla.tail8d86e.ts.net` -- Backup configuration added to borgmatic diff --git a/docs/zk/transmission-log.md b/docs/zk/transmission-log.md deleted file mode 100644 index 794bb53..0000000 --- a/docs/zk/transmission-log.md +++ /dev/null @@ -1,98 +0,0 @@ ---- -id: transmission-log -tags: - - blumeops ---- - -# Transmission Management Log - -Transmission is a BitTorrent daemon running in Kubernetes, primarily used to download large ZIM archives for [[kiwix|Kiwix]]. - -## Service Details - -- URL: https://torrent.tail8d86e.ts.net -- Namespace: `torrent` -- Image: `lscr.io/linuxserver/transmission:latest` -- ArgoCD app: `torrent` -- Storage: NFS PVC from sifaka (`/volume1/torrents`) - -## Useful Commands - -```bash -# View transmission logs -kubectl --context=minikube-indri -n torrent logs -f deployment/transmission - -# Check RPC connectivity (from another pod) -kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \ - curl -s http://transmission.torrent.svc.cluster.local:9091/transmission/rpc - -# Sync from ArgoCD -argocd app sync torrent -``` - -## ArgoCD Management - -Transmission is deployed via ArgoCD from `argocd/manifests/torrent/`: -- `deployment.yaml` - Transmission container with NFS volume -- `service.yaml` - ClusterIP service (port 9091) -- `ingress-tailscale.yaml` - Tailscale Ingress for web UI -- `pv-nfs.yaml` - NFS PersistentVolume -- `pvc.yaml` - PersistentVolumeClaim - -## Storage Layout - -The NFS share on sifaka (`/volume1/torrents`) has this structure: -- `/downloads/` - Active downloads and torrent metadata -- `/downloads/complete/` - Completed downloads -- `/config/` - Transmission configuration -- `/watch/` - Watch directory for .torrent files - -Kiwix reads from `/downloads/complete/` to serve ZIM archives. - -## Integration with Kiwix - -The [[kiwix]] deployment includes a torrent-sync sidecar that: -1. Reads the declarative ZIM torrent list from a ConfigMap -2. Adds missing torrents to Transmission via RPC -3. Runs on startup and every 30 minutes - -When downloads complete: -1. Transmission moves files to `/downloads/complete/` -2. The zim-watcher CronJob (in kiwix namespace) detects new ZIMs -3. Kiwix deployment is restarted to pick up new archives - -## Monitoring - -**TODO:** Write custom transmission exporter. Existing exporters (`metalmatze/transmission-exporter`, `sandrotosi/simple_transmission_exporter`) are incompatible with Transmission 4's changed JSON API (type mismatches in `lastScrapeTimedOut` field). - -Current monitoring via web UI at https://torrent.tail8d86e.ts.net: -- Active/seeding/paused torrent counts -- Upload/download speeds -- Disk usage - -Basic uptime monitoring via blackbox probe in [[alloy|Alloy k8s]] (see Services Health dashboard). - -## Log - -### 2026-01-22 - -- Attempted to add `metalmatze/transmission-exporter` sidecar for Prometheus metrics -- Exporter failed with JSON parsing errors - incompatible with Transmission 4 API changes -- Removed exporter sidecar, dashboard, and Prometheus scrape config -- Added basic HTTP probe via Alloy k8s blackbox exporter instead -- Deleted stale `transmission.prom` textfile from indri - -### 2026-01-21 (P6) -- **Migrated to Kubernetes** (Phase 6 of k8s migration) -- NFS PersistentVolume for storage on sifaka -- Tailscale Ingress at `torrent.tail8d86e.ts.net` -- RPC accessible to kiwix namespace for torrent sync -- Moved existing ZIM files to `/downloads/complete/` for seeding -- Retired ansible transmission role from indri - -### 2026-01-14 -- Added transmission role to ansible playbook -- Integrated with kiwix role for torrent-based ZIM downloads -- Initial setup with transmission-cli via homebrew -- Managed via brew services on port 9091 -- Metrics collected via textfile exporter diff --git a/docs/zk/zot-log.md b/docs/zk/zot-log.md deleted file mode 100644 index 5950a8b..0000000 --- a/docs/zk/zot-log.md +++ /dev/null @@ -1,109 +0,0 @@ ---- -id: zot-log -tags: -- blumeops ---- - -# Zot Registry Management Log - -Zot is an OCI-native container registry running on Indri, providing: -1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits) -2. Private image storage for custom-built containers - -## Service Details - -- URL: https://registry.ops.eblu.me -- Local port: 5050 -- Data directory: ~/zot -- Config: ~/.config/zot/config.json -- Managed via: mcquack LaunchAgent - -## Namespace Convention - -| Path | Source | -|------|--------| -| `registry.../docker.io/*` | Cached from Docker Hub | -| `registry.../ghcr.io/*` | Cached from GHCR | -| `registry.../quay.io/*` | Cached from Quay | -| `registry.../blumeops/*` | Private images (yours) | - -## How It Works - -### Pull-Through Cache (Automatic) - -When [[minikube]] pulls an image like `docker.io/library/nginx:latest`: -1. Containerd checks zot first (via `host.minikube.internal:5050`) -2. If zot has it cached, returns immediately -3. If not, zot fetches from upstream, caches it, returns to k8s - -Cached images appear under their original registry path (e.g., `docker.io/library/nginx`). - -### Private Images (Manual Push) - -Build and push from gilbert using podman: -```bash -# Build -podman build -t registry.ops.eblu.me/blumeops/myapp:v1 . - -# Push to zot -podman push registry.ops.eblu.me/blumeops/myapp:v1 - -# Use in k8s manifest -image: registry.ops.eblu.me/blumeops/myapp:v1 -``` - -Private images go under `blumeops/*` namespace. Example: the devpi container is at `registry.ops.eblu.me/blumeops/devpi:latest`. - -### Security Model - -**Network access only** - no authentication configured. Anyone who can reach zot via Tailscale ACL can push/pull any image. Defense is the tailnet boundary. - -Zot supports htpasswd/LDAP/OIDC auth if needed in the future. - -## Minikube Integration - -The [[minikube]] cluster uses zot as a registry mirror via containerd configuration. Managed by the ansible `minikube` role. - -From inside minikube, zot is at `host.minikube.internal:5050`. Containerd tries the mirror first, falls back to upstream if not cached. - -Mirrors configured for: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io` - -## Useful Commands - -```bash -# List all cached/pushed images -curl -s http://indri:5050/v2/_catalog | jq - -# List tags for an image -curl -s http://indri:5050/v2/blumeops/devpi/tags/list | jq - -# Check service status -ssh indri 'launchctl list | grep zot' - -# View logs -ssh indri 'tail -f ~/Library/Logs/mcquack.zot.err.log' -``` - -## Log - -### 2026-01-25 -- **Migrated from Tailscale serve to Caddy** - now accessible at `registry.ops.eblu.me` -- Retired `tailscale_serve` ansible role (no longer needed) -- Updated minikube containerd config to use new URL -- Updated CI workflows and mise tasks -- Old URL (`registry.tail8d86e.ts.net`) deprecated - -### 2026-01-21 -- Verified full workflow: podman build on gilbert → push to zot → k8s pull -- Documented security model (network-only auth via Tailscale ACL) -- Updated minikube integration: now uses containerd (docker driver) instead of CRI-O (podman driver) -- Mirror endpoint changed from `host.containers.internal:5050` to `host.minikube.internal:5050` - -### 2026-01-19 -- Integrated with minikube as CRI-O registry mirror -- All k8s image pulls now go through zot cache automatically - -### 2026-01-18 -- Initial setup for k8s migration Phase 0 -- Configured pull-through cache for Docker Hub, GHCR, Quay -- Exposed via Tailscale service at registry.tail8d86e.ts.net diff --git a/mise-tasks/zk-docs b/mise-tasks/zk-docs index 2ce9bb3..6eca11b 100755 --- a/mise-tasks/zk-docs +++ b/mise-tasks/zk-docs @@ -1,15 +1,19 @@ #!/usr/bin/env bash -#MISE description="Concatenate all blumeops zettelkasten cards" +#MISE description="Prime AI context with key BlumeOps documentation" set -euo pipefail -# Blumeops docs now live in the repo itself (symlinked into zk) -DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs/zk" -MAIN_CARD="$DOCS_DIR/1767747119-YCPO.md" +DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs" -# Find all files tagged with blumeops (excluding main card) -other_cards=$(grep -l '^ - blumeops$' "$DOCS_DIR"/*.md 2>/dev/null | grep -v "$(basename "$MAIN_CARD")" | sort) +# Key files for AI context priming, in order of importance +FILES=( + "$DOCS_DIR/tutorials/ai-assistance-guide.md" + "$DOCS_DIR/reference/index.md" + "$DOCS_DIR/how-to/index.md" + "$DOCS_DIR/explanation/architecture.md" + "$DOCS_DIR/tutorials/index.md" +) -# Concatenate: main card first, then others +# Concatenate files with headers showing paths # Pass through any args to bat (e.g., --style=header --color=never --decorations=always) -bat "$@" "$MAIN_CARD" $other_cards +bat "$@" "${FILES[@]}"