Merge pull request 'heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)' (#371 ) from heph-offline-access into main

heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)
The heph CLI requests scope "openid offline_access", but the Authentik heph OAuth2 provider only mapped openid/email/profile. Without the offline_access mapping the issued refresh token is bound to the login session rather than the 30-day refresh-token window; once the session lapses, hephd's refresh_token grant returns 400 Bad Request and spoke sync silently degrades (heph sync --status -> auth_failure: true). Add the built-in offline_access scope mapping to the provider's property_mappings and document the requirement in the service reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:29:47 -07:00 · 2026-06-06 18:07:13 -07:00 · 2026-06-05 08:22:46 -07:00 · 2026-06-05 07:40:51 -07:00 · 2026-06-05 07:30:31 -07:00 · 2026-06-05 06:46:58 -07:00
784 changed files with 60742 additions and 36990 deletions
--- a/.claude/agents/change-classifier.md
+++ b/.claude/agents/change-classifier.md
@ -0,0 +1,62 @@
+---
+name: change-classifier
+description: Classifies proposed changes as C0/C1/C2 before work begins. Use proactively when the user describes a new task or change, before any implementation starts.
+tools: Read, Glob, Grep, Bash
+model: haiku
+permissionMode: dontAsk
+---
+
+You are a change classifier for the BlumeOps infrastructure project. Your job is to assess a proposed change and classify it as C0, C1, or C2 before any work begins.
+
+## Classification Criteria
+
+| Class | Name | When to use | Key trait |
+|-------|------|-------------|-----------|
+| **C0** | Quick Fix | Small, low-risk, fix-forward safe | Direct to main, no PR |
+| **C1** | Human Review | Moderate complexity or risk | Feature branch + PR, docs-first |
+| **C2** | Mikado Chain | Multi-phase, multi-session, high complexity | Mikado Branch Invariant |
+
+## Assessment Process
+
+1. Understand what the user wants to change
+2. Identify which files/services are affected — use Glob/Grep to check the blast radius
+3. Assess risk factors:
+   - How many files change?
+   - Are critical services affected (networking, auth, DNS)?
+   - Is the change easily reversible?
+   - Could it cause downtime?
+   - Does it span multiple services or systems?
+   - Does it require multi-step sequencing?
+4. Classify and explain your reasoning
+
+## C0 Indicators
+- Single file or small number of related files
+- Config value change, version bump, typo fix, doc update
+- No service restart needed, or restart is safe
+- Easy to fix-forward if wrong
+
+## C1 Indicators
+- Multiple files across a service boundary
+- New feature or significant behavior change
+- Could affect service availability
+- Needs human review for correctness
+- Touching Ansible roles, ArgoCD manifests, or routing config
+
+## C2 Indicators
+- Multi-phase work with ordering dependencies
+- Spans multiple sessions or multiple services
+- Requires prerequisite changes before the main goal
+- User explicitly requests Mikado methodology
+- Discovery-heavy work where the full scope isn't known upfront
+
+## Output Format
+
+```
+Classification: C0 / C1 / C2
+Confidence: high / medium / low
+Rationale: <1-2 sentences>
+Blast radius: <files/services affected>
+Risk factors: <key concerns, if any>
+```
+
+If confidence is low, explain what additional information would help. When in doubt, classify one level higher (C0 → C1, C1 → C2).
--- a/.claude/agents/infra-health.md
+++ b/.claude/agents/infra-health.md
@ -0,0 +1,36 @@
+---
+name: infra-health
+description: Infrastructure health monitor. Use proactively after deployments, provisioning, or when the user asks about service status. Runs services-check and diagnoses failures.
+tools: Bash, Read, Grep, Glob
+model: haiku
+permissionMode: dontAsk
+background: true
+---
+
+You are an infrastructure health monitor for the BlumeOps homelab.
+
+When invoked, run the full health check suite and report results:
+
+1. Run `mise run services-check` and capture the full output
+2. Parse the results — identify any FAILED services
+3. For each failure, provide a brief diagnosis:
+   - Is the service process down?
+   - Is it a network/connectivity issue?
+   - Is it an ArgoCD sync issue?
+4. Summarize: total services checked, how many passed, how many failed
+
+If everything is healthy, keep the summary to one line.
+
+If there are failures, group them by category:
+- **Process failures** (service not running)
+- **HTTP failures** (endpoint not responding)
+- **Kubernetes failures** (pod not running, sync issues)
+- **Connectivity failures** (SSH, network)
+
+Do NOT attempt to fix anything. Report findings only.
+
+Context:
+- Services run across indri (Mac Mini, native + minikube), ringtail (NixOS, k3s), and Fly.io
+- Use `--context=minikube-indri` for indri k8s commands, `--context=k3s-ringtail` for ringtail
+- HTTP endpoints are proxied through Caddy at `*.ops.eblu.me`
+- Public endpoints go through Fly.io at `*.eblu.me`
--- a/.claude/agents/mikado-navigator.md
+++ b/.claude/agents/mikado-navigator.md
@ -0,0 +1,69 @@
+---
+name: mikado-navigator
+description: Mikado chain navigator for C2 changes. Use when resuming a C2 chain, checking chain status, or deciding which leaf node to work next. Understands the Mikado Branch Invariant.
+tools: Read, Glob, Grep, Bash
+model: sonnet
+permissionMode: dontAsk
+---
+
+You are a Mikado chain navigator for the BlumeOps C2 change process. You help the user understand the current state of a Mikado chain and decide what to do next.
+
+## What You Do
+
+1. Run `mise run docs-mikado --resume` to detect the current chain state
+2. Read the relevant Mikado cards (docs in `docs/how-to/` with `status: active`)
+3. Analyze the dependency graph and branch position
+4. Recommend the next action
+
+## Chain State Analysis
+
+After running `docs-mikado --resume`, interpret the output:
+
+- **Planning phase:** Cards are being added, no code yet. Suggest reviewing the dependency graph for completeness.
+- **Mid-cycle:** An `impl` is in progress. Identify which leaf is being worked and what remains.
+- **Between cycles:** A leaf was just closed. Identify the next ready leaf and summarize what it requires.
+- **Finalized:** The chain is complete and awaiting merge.
+- **Invariant violation:** A plan commit was found after impl. Explain the reset procedure.
+
+## Recommending Next Actions
+
+For each ready leaf node:
+1. Read the card content to understand what it requires
+2. Check if there are related source files (manifests, playbooks, configs)
+3. Assess relative complexity and suggest an ordering if multiple leaves are ready
+4. Note any potential risks or dependencies not captured in the card graph
+
+## The Mikado Branch Invariant
+
+The branch must always have this structure:
+```
+main <- [plan commits] <- [impl, close] <- [impl, close] <- ... <- [finalize]
+```
+
+Rules:
+- First N commits are card-only (plan phase)
+- Then repeating cycles of impl + close
+- No card introductions after any code commit
+- New prerequisites require a branch reset
+
+## Output Format
+
+```
+Chain: <name>
+Branch: <branch name>
+Position: <planning / mid-cycle / between-cycles / etc.>
+PR: #<number> (if exists)
+
+Ready leaves:
+  1. <leaf-stem> — <title> — <brief description of work needed>
+  2. ...
+
+Recommendation: <what to do next and why>
+```
+
+## Important
+
+- Do NOT make any changes. You are advisory only.
+- If the user is on `main`, list all active chains and suggest which to resume.
+- If PR comments exist, remind the user to check them with `mise run pr-comments <number>`.
+- Check for stashed work — resets sometimes leave stashed changes.
--- a/.forgejo/workflows/branch-cleanup.yaml
+++ b/.forgejo/workflows/branch-cleanup.yaml
@ -0,0 +1,40 @@
+# Automated Branch Cleanup
+#
+# Deletes remote branches that have been merged into main and are older
+# than a cutoff (default 30 days). Detects both fast-forward and
+# squash-merged branches via the Forgejo API.
+#
+# Runs on a schedule (~every 10 days) and can be triggered manually
+# with a custom cutoff for testing.
+
+name: Branch Cleanup
+
+on:
+  schedule:
+    # Approximately every 10 days: 1st, 11th, 21st of each month at 06:00 UTC
+    - cron: '0 6 1,11,21 * *'
+  workflow_dispatch:
+    inputs:
+      cutoff:
+        description: 'Delete branches older than N days'
+        required: false
+        default: '30'
+        type: string
+
+jobs:
+  cleanup:
+    runs-on: k8s
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Run branch cleanup
+        env:
+          FORGEJO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          CUTOFF="${{ inputs.cutoff || '30' }}"
+          echo "Running branch cleanup with cutoff=${CUTOFF} days..."
+          uv run --script mise-tasks/branch-cleanup \
+            --remote-only \
+            --yes \
+            --cutoff "$CUTOFF"
--- a/.forgejo/workflows/build-blumeops.yaml
+++ b/.forgejo/workflows/build-blumeops.yaml
@ -0,0 +1,291 @@
+# BlumeOps Release Workflow
+#
+# Creates a versioned release of BlumeOps with all build artifacts.
+# Currently includes:
+#   - Documentation site (Quartz static build)
+#   - Changelog (built from towncrier fragments)
+#
+# Usage:
+#   1. Go to Actions > Build BlumeOps > Run workflow
+#   2. Select version bump type (patch/minor/major) or choose specific version
+#   3. The workflow creates a release with attached artifacts
+#
+# Documentation asset URL:
+#   https://forge.eblu.me/eblume/blumeops/releases/download/<tag>/docs-<version>.tar.gz
+
+name: Build BlumeOps
+
+on:
+  workflow_dispatch:
+    inputs:
+      version_type:
+        description: 'Version bump type'
+        required: true
+        default: 'BUMP_PATCH'
+        type: choice
+        options:
+          - BUMP_PATCH
+          - BUMP_MINOR
+          - BUMP_MAJOR
+          - SPECIFIC_VERSION
+      specific_version:
+        description: 'Specific version (only used when version_type is SPECIFIC_VERSION, e.g., v1.2.0)'
+        required: false
+        default: ''
+        type: string
+
+jobs:
+  build:
+    runs-on: k8s
+    steps:
+      - name: Resolve version
+        id: version
+        run: |
+          VERSION_TYPE="${{ inputs.version_type }}"
+          SPECIFIC_VERSION="${{ inputs.specific_version }}"
+
+          # Fetch latest release
+          echo "Fetching latest release..."
+          LATEST=$(curl -s "https://forge.eblu.me/api/v1/repos/eblume/blumeops/releases/latest" | jq -r '.tag_name // empty' || true)
+
+          if [ -z "$LATEST" ]; then
+            LATEST="v0.0.0"
+            echo "No previous releases found, using base version: $LATEST"
+          else
+            echo "Latest release: $LATEST"
+          fi
+
+          # Parse current version components (strip 'v' prefix)
+          CURRENT="${LATEST#v}"
+          MAJOR=$(echo "$CURRENT" | cut -d. -f1)
+          MINOR=$(echo "$CURRENT" | cut -d. -f2)
+          PATCH=$(echo "$CURRENT" | cut -d. -f3)
+
+          case "$VERSION_TYPE" in
+            BUMP_MAJOR)
+              VERSION="v$((MAJOR + 1)).0.0"
+              echo "Bumping major: $LATEST -> $VERSION"
+              ;;
+            BUMP_MINOR)
+              VERSION="v${MAJOR}.$((MINOR + 1)).0"
+              echo "Bumping minor: $LATEST -> $VERSION"
+              ;;
+            BUMP_PATCH)
+              VERSION="v${MAJOR}.${MINOR}.$((PATCH + 1))"
+              echo "Bumping patch: $LATEST -> $VERSION"
+              ;;
+            SPECIFIC_VERSION)
+              if [ -z "$SPECIFIC_VERSION" ]; then
+                echo "Error: specific_version is required when version_type is SPECIFIC_VERSION"
+                exit 1
+              fi
+              # Validate format
+              if [[ ! "$SPECIFIC_VERSION" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+                echo "Error: Version must be in format vX.Y.Z (e.g., v1.0.0)"
+                exit 1
+              fi
+              VERSION="$SPECIFIC_VERSION"
+              echo "Using specific version: $VERSION"
+              ;;
+            *)
+              echo "Error: Unknown version_type: $VERSION_TYPE"
+              exit 1
+              ;;
+          esac
+
+          # Check if this version already exists
+          if curl -sf "https://forge.eblu.me/api/v1/repos/eblume/blumeops/releases/tags/$VERSION" > /dev/null 2>&1; then
+            echo "Error: Release $VERSION already exists"
+            echo "See: https://forge.eblu.me/eblume/blumeops/releases/tag/$VERSION"
+            exit 1
+          fi
+
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+          echo "Building BlumeOps release: $VERSION"
+
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          fetch-depth: 0
+
+      - name: Build changelog
+        id: changelog
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+
+          # Run towncrier on the runner so that CHANGELOG.md updates and
+          # fragment deletions appear in the working tree for both the Quartz
+          # build (next step) and the git commit step.
+          # Check if there are any changelog fragments
+          FRAGMENTS=$(find docs/changelog.d -name "*.md" -not -name ".gitkeep" 2>/dev/null | wc -l)
+
+          if [ "$FRAGMENTS" -gt 0 ]; then
+            echo "Found $FRAGMENTS changelog fragments, building changelog..."
+            uvx towncrier build --version "$VERSION" --yes
+            echo "changelog_updated=true" >> "$GITHUB_OUTPUT"
+
+            # Extract the changelog section for this release to include in release body
+            RELEASE_NOTES=$(awk -v ver="$VERSION" '
+              /^## \[/ {
+                if (found) exit
+                if (index($0, "[" ver "]")) found=1
+              }
+              found {print}
+            ' CHANGELOG.md | tail -n +2)
+
+            echo "$RELEASE_NOTES" > /tmp/release_notes.md
+            echo "Release notes extracted for $VERSION"
+          else
+            echo "No changelog fragments found, skipping towncrier"
+            echo "changelog_updated=false" >> "$GITHUB_OUTPUT"
+            echo "" > /tmp/release_notes.md
+          fi
+
+      - name: Build docs
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          TARBALL="docs-${VERSION}.tar.gz"
+          echo "Building docs via Dagger..."
+          # Towncrier already ran on the runner above, so the working tree
+          # has an up-to-date CHANGELOG.md. build-docs now only runs the
+          # Quartz static site build (no towncrier).
+          dagger call build-docs --src=. --version="$VERSION" \
+            export --path="./$TARBALL"
+          echo "Build complete!"
+          ls -lh "$TARBALL"
+
+      - name: Create release
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          TARBALL="docs-${VERSION}.tar.gz"
+          CHANGELOG_UPDATED="${{ steps.changelog.outputs.changelog_updated }}"
+
+          echo "Creating release $VERSION..."
+
+          # Build release body with changelog if available
+          {
+            echo "BlumeOps release $VERSION"
+            echo ""
+
+            if [ "$CHANGELOG_UPDATED" = "true" ] && [ -s /tmp/release_notes.md ]; then
+              echo "## What's Changed"
+              echo ""
+              cat /tmp/release_notes.md
+              echo ""
+            fi
+
+            echo "## Documentation"
+            echo ""
+            echo "Download \`$TARBALL\` directly, or bump \`docs_version\`"
+            echo "in \`ansible/roles/docs/defaults/main.yml\` and run:"
+            echo ""
+            echo "\`\`\`"
+            echo "mise run provision-indri -- --tags docs"
+            echo "\`\`\`"
+          } > /tmp/release_body.txt
+
+          # Use jq to properly escape the body for JSON
+          RELEASE_DATA=$(jq -n \
+            --arg tag "$VERSION" \
+            --arg name "BlumeOps $VERSION" \
+            --rawfile body /tmp/release_body.txt \
+            '{tag_name: $tag, name: $name, body: $body, draft: false, prerelease: false}')
+
+          RELEASE_RESPONSE=$(curl -s \
+            -X POST \
+            -H "Content-Type: application/json" \
+            -H "Authorization: token $GITHUB_TOKEN" \
+            -d "$RELEASE_DATA" \
+            "https://forge.eblu.me/api/v1/repos/eblume/blumeops/releases")
+
+          echo "API Response: $RELEASE_RESPONSE"
+
+          RELEASE_ID=$(echo "$RELEASE_RESPONSE" | jq -r '.id')
+
+          if [ -z "$RELEASE_ID" ] || [ "$RELEASE_ID" = "null" ]; then
+            echo "Error: Failed to create release"
+            exit 1
+          fi
+
+          echo "Created release ID: $RELEASE_ID"
+
+          # Upload the asset
+          echo "Uploading $TARBALL..."
+          UPLOAD_RESPONSE=$(curl -s \
+            -X POST \
+            -H "Content-Type: application/gzip" \
+            -H "Authorization: token $GITHUB_TOKEN" \
+            --data-binary "@$TARBALL" \
+            "https://forge.eblu.me/api/v1/repos/eblume/blumeops/releases/$RELEASE_ID/assets?name=$TARBALL")
+
+          echo "Upload Response: $UPLOAD_RESPONSE"
+          echo ""
+          echo "Release created successfully!"
+
+      - name: Bump docs_version in ansible role
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          DEFAULTS_FILE="ansible/roles/docs/defaults/main.yml"
+
+          echo "Bumping docs_version in $DEFAULTS_FILE to ${VERSION}..."
+          yq -i ".docs_version = \"${VERSION}\"" "$DEFAULTS_FILE"
+
+          echo "Updated defaults:"
+          grep -E "^docs_version:" "$DEFAULTS_FILE"
+
+      - name: Commit release changes
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          CHANGELOG_UPDATED="${{ steps.changelog.outputs.changelog_updated }}"
+
+          # Configure git
+          git config user.name "Forgejo Actions"
+          git config user.email "actions@forge.ops.eblu.me"
+
+          # Stage deployment changes
+          git add ansible/roles/docs/defaults/main.yml
+
+          # Stage changelog changes if updated
+          if [ "$CHANGELOG_UPDATED" = "true" ]; then
+            git add CHANGELOG.md docs/changelog.d/
+          fi
+
+          # Check if there are changes to commit
+          if git diff --cached --quiet; then
+            echo "No changes to commit"
+          else
+            git commit -m "Update docs release to $VERSION
+
+          $([ "$CHANGELOG_UPDATED" = "true" ] && echo "- Built changelog from towncrier fragments")
+
+          [skip ci]"
+
+            # Push to main
+            git push origin HEAD:main
+            echo "Changes committed and pushed"
+          fi
+
+      - name: Summary
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          TARBALL="docs-${VERSION}.tar.gz"
+          echo "================================================"
+          echo "BlumeOps Release: $VERSION"
+          echo "================================================"
+          echo ""
+          echo "Release URL:"
+          echo "  https://forge.eblu.me/eblume/blumeops/releases/tag/$VERSION"
+          echo ""
+          echo "Asset URL:"
+          echo "  https://forge.eblu.me/eblume/blumeops/releases/download/$VERSION/$TARBALL"
+          echo ""
+          echo "To deploy on indri, run from gilbert:"
+          echo "  mise run provision-indri -- --tags docs"
+          echo ""
+          echo "Then purge the Fly.io proxy cache:"
+          echo "  fly ssh console -a blumeops-proxy -C \\"
+          echo "    \"sh -c 'rm -rf /tmp/cache && nginx -s reload'\""
--- a/.forgejo/workflows/build-container.yaml
+++ b/.forgejo/workflows/build-container.yaml
@ -0,0 +1,202 @@
+# Unified container build workflow
+# Manual dispatch only — use `mise run container-build-and-release <name>`.
+# Shared Dagger helpers (src/blumeops/) make path-based auto-triggers unreliable,
+# so all container builds are triggered explicitly.
+# Routes to the correct runner:
+#   - Dockerfile/Dagger containers build on k8s (indri) via Dagger
+#   - Nix containers build on nix-container-builder (ringtail) via nix-build + skopeo
+name: Build Container
+
+on:
+  workflow_dispatch:
+    inputs:
+      container:
+        description: 'Container name (directory under containers/)'
+        required: true
+        type: string
+      ref:
+        description: 'Commit SHA to build (defaults to current HEAD)'
+        required: false
+        type: string
+
+jobs:
+  detect:
+    runs-on: k8s
+    outputs:
+      dagger: ${{ steps.classify.outputs.dagger }}
+      nix: ${{ steps.classify.outputs.nix }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          ref: ${{ inputs.ref || github.sha }}
+          fetch-depth: 2
+
+      - name: Classify container build type
+        id: classify
+        run: |
+          CHANGED='["${{ inputs.container }}"]'
+          echo "Building container: $CHANGED"
+
+          # Classify each container by build type (a container can appear in both)
+          DAGGER='[]'
+          NIX='[]'
+          for name in $(echo "$CHANGED" | jq -r '.[]'); do
+            has_any=false
+            if [ -f "containers/$name/container.py" ] || [ -f "containers/$name/Dockerfile" ]; then
+              DAGGER=$(echo "$DAGGER" | jq -c --arg n "$name" '. + [$n]')
+              has_any=true
+            fi
+            if [ -f "containers/$name/default.nix" ]; then
+              NIX=$(echo "$NIX" | jq -c --arg n "$name" '. + [$n]')
+              has_any=true
+            fi
+            if [ "$has_any" = "false" ]; then
+              echo "Warning: $name has neither container.py, Dockerfile, nor default.nix — skipping"
+            fi
+          done
+
+          echo "dagger=$DAGGER" >> "$GITHUB_OUTPUT"
+          echo "nix=$NIX" >> "$GITHUB_OUTPUT"
+          echo "Dagger builds: $DAGGER"
+          echo "Nix builds: $NIX"
+
+  build-dagger:
+    needs: detect
+    if: needs.detect.outputs.dagger != '[]'
+    runs-on: k8s
+    env:
+      # Send Dagger OTLP telemetry to Tempo. Without a real backend the
+      # engine's internal proxy returns 500 on /v1/metrics, causing noisy
+      # retry warnings in every build.
+      OTEL_EXPORTER_OTLP_ENDPOINT: http://tempo.tracing.svc.cluster.local:4318
+    strategy:
+      matrix:
+        container: ${{ fromJson(needs.detect.outputs.dagger) }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          ref: ${{ inputs.ref || github.sha }}
+
+      - name: Extract version and SHA
+        id: meta
+        run: |
+          CONTAINER="${{ matrix.container }}"
+
+          # Try native Dagger pipeline (container.py) first, fall back to Dockerfile
+          if [ -f "containers/$CONTAINER/container.py" ]; then
+            VERSION=$(dagger call container-version --container-name="$CONTAINER")
+          elif [ -f "containers/$CONTAINER/Dockerfile" ]; then
+            VERSION=$(grep -m1 '^ARG CONTAINER_APP_VERSION=' \
+              "containers/$CONTAINER/Dockerfile" \
+              | sed 's/^ARG CONTAINER_APP_VERSION=//')
+          fi
+
+          if [ -z "$VERSION" ]; then
+            echo "Error: Could not extract version for $CONTAINER"
+            exit 1
+          fi
+
+          REF="${{ inputs.ref }}"
+          if [ -z "$REF" ]; then
+            REF="${GITHUB_SHA}"
+          fi
+          SHORT_SHA=$(echo "$REF" | head -c 7)
+
+          # Ensure version starts with 'v'
+          case "$VERSION" in
+            v*) ;;
+            *) VERSION="v${VERSION}" ;;
+          esac
+
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+          echo "sha=$SHORT_SHA" >> "$GITHUB_OUTPUT"
+          echo "Version: $VERSION, SHA: $SHORT_SHA"
+
+      - name: Publish
+        env:
+          ZOT_CI_API_KEY: ${{ secrets.ZOT_CI_API_KEY }}
+        run: |
+          dagger call publish \
+            --src=. \
+            --container-name=${{ matrix.container }} \
+            --version=${{ steps.meta.outputs.version }} \
+            --commit-sha=${{ steps.meta.outputs.sha }} \
+            --registry-password=env:ZOT_CI_API_KEY
+
+  build-nix:
+    needs: detect
+    if: needs.detect.outputs.nix != '[]'
+    runs-on: nix-container-builder
+    strategy:
+      matrix:
+        container: ${{ fromJson(needs.detect.outputs.nix) }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          ref: ${{ inputs.ref || github.sha }}
+
+      - name: Extract version and SHA
+        id: meta
+        run: |
+          CONTAINER="${{ matrix.container }}"
+          NIX_FILE="containers/$CONTAINER/default.nix"
+
+          # Extract version = "..." from the nix file
+          VERSION=$(grep -m1 '^\s*version\s*=\s*"' "$NIX_FILE" \
+            | sed 's/.*"\(.*\)".*/\1/' || true)
+
+          if [ -z "$VERSION" ]; then
+            echo "Error: No version declaration found in $NIX_FILE"
+            exit 1
+          fi
+
+          REF="${{ inputs.ref }}"
+          if [ -z "$REF" ]; then
+            REF="${GITHUB_SHA}"
+          fi
+          SHORT_SHA=$(echo "$REF" | head -c 7)
+
+          # Ensure version starts with 'v'
+          case "$VERSION" in
+            v*) ;;
+            *) VERSION="v${VERSION}" ;;
+          esac
+
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+          echo "sha=$SHORT_SHA" >> "$GITHUB_OUTPUT"
+          echo "Version: $VERSION, SHA: $SHORT_SHA"
+
+      - name: Resolve nixpkgs
+        id: nixpkgs
+        run: |
+          NIXPKGS_PATH=$(nix flake metadata nixpkgs --json | jq -r '.path')
+          echo "Resolved nixpkgs: $NIXPKGS_PATH"
+          echo "path=$NIXPKGS_PATH" >> "$GITHUB_OUTPUT"
+
+      - name: Build with nix
+        env:
+          NIX_PATH: "nixpkgs=${{ steps.nixpkgs.outputs.path }}"
+        run: |
+          echo "Building containers/${{ matrix.container }}/default.nix"
+          echo "NIX_PATH=$NIX_PATH"
+          nix-build "containers/${{ matrix.container }}/default.nix" -o result
+          echo "Build complete: $(readlink result)"
+
+      - name: Push to registry
+        env:
+          ZOT_CI_API_KEY: ${{ secrets.ZOT_CI_API_KEY }}
+        run: |
+          CONTAINER="${{ matrix.container }}"
+          VERSION="${{ steps.meta.outputs.version }}"
+          SHORT_SHA="${{ steps.meta.outputs.sha }}"
+          IMAGE="registry.ops.eblu.me/blumeops/$CONTAINER:${VERSION}-${SHORT_SHA}-nix"
+
+          echo "Pushing to $IMAGE"
+          skopeo copy \
+            --dest-creds="zot-ci:$ZOT_CI_API_KEY" \
+            "docker-archive:result" \
+            "docker://$IMAGE"
+          echo "Push complete: $IMAGE"
--- a/.forgejo/workflows/cv-deploy.yaml
+++ b/.forgejo/workflows/cv-deploy.yaml
@ -0,0 +1,109 @@
+# CV Deploy Workflow
+#
+# Bumps cv_version in ansible/roles/cv/defaults/main.yml and pushes the change.
+# Deployment to indri is manual (runner has no SSH access to indri):
+#   mise run provision-indri -- --tags cv
+#
+# Usage:
+#   1. Release a new CV package from the cv repo first
+#   2. Go to Actions > Deploy CV > Run workflow
+#   3. Enter the version to deploy, or leave as "latest"
+#   4. Run the command above on gilbert to apply
+
+name: Deploy CV
+
+on:
+  workflow_dispatch:
+    inputs:
+      version:
+        description: 'CV package version to deploy (e.g., v1.0.0, or "latest")'
+        required: true
+        default: 'latest'
+        type: string
+
+jobs:
+  deploy:
+    runs-on: k8s
+    steps:
+      - name: Resolve version
+        id: version
+        run: |
+          INPUT_VERSION="${{ inputs.version }}"
+
+          if [ "$INPUT_VERSION" = "latest" ]; then
+            echo "Resolving latest CV package version..."
+            VERSION=$(curl -s "https://forge.eblu.me/api/v1/packages/eblume?type=generic&q=cv" \
+              | jq -r '[.[] | select(.name == "cv")] | sort_by(.version) | last | .version // empty')
+
+            if [ -z "$VERSION" ]; then
+              echo "Error: No CV packages found"
+              exit 1
+            fi
+            echo "Resolved latest version: $VERSION"
+          else
+            VERSION="$INPUT_VERSION"
+            if [[ ! "$VERSION" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+              echo "Error: Version must be in format vX.Y.Z (e.g., v1.0.0)"
+              exit 1
+            fi
+          fi
+
+          # Verify the package exists
+          TARBALL="cv-${VERSION}.tar.gz"
+          PACKAGE_URL="https://forge.eblu.me/api/packages/eblume/generic/cv/${VERSION}/${TARBALL}"
+          if ! curl -fsSL --head "$PACKAGE_URL" > /dev/null 2>&1; then
+            echo "Error: Package not found at $PACKAGE_URL"
+            echo "Run the 'Release CV' workflow in the cv repo first."
+            exit 1
+          fi
+          echo "Package verified: $PACKAGE_URL"
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Bump cv_version in ansible role
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          DEFAULTS_FILE="ansible/roles/cv/defaults/main.yml"
+
+          echo "Bumping cv_version in $DEFAULTS_FILE to ${VERSION}..."
+          yq -i ".cv_version = \"${VERSION}\"" "$DEFAULTS_FILE"
+
+          echo "Updated defaults:"
+          grep -E "^cv_version:" "$DEFAULTS_FILE"
+
+      - name: Commit release changes
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+
+          git config user.name "Forgejo Actions"
+          git config user.email "actions@forge.ops.eblu.me"
+
+          git add ansible/roles/cv/defaults/main.yml
+
+          if git diff --cached --quiet; then
+            echo "No changes to commit (already at $VERSION)"
+          else
+            git commit -m "Update CV release to $VERSION
+
+          [skip ci]"
+            git push origin HEAD:main
+            echo "Changes committed and pushed"
+          fi
+
+      - name: Summary
+        run: |
+          VERSION="${{ steps.version.outputs.version }}"
+          echo "================================================"
+          echo "CV version bumped: $VERSION"
+          echo "================================================"
+          echo ""
+          echo "To deploy on indri, run from gilbert:"
+          echo "  mise run provision-indri -- --tags cv"
+          echo ""
+          echo "Then purge the Fly.io proxy cache:"
+          echo "  fly ssh console -a blumeops-proxy -C \\"
+          echo "    \"sh -c 'rm -rf /tmp/cache && nginx -s reload'\""
--- a/.forgejo/workflows/deploy-fly.yaml
+++ b/.forgejo/workflows/deploy-fly.yaml
@ -0,0 +1,37 @@
+name: Deploy Fly.io Proxy
+
+on:
+  workflow_dispatch:
+  push:
+    branches: [main]
+    paths:
+      - 'fly/**'
+
+jobs:
+  deploy:
+    runs-on: k8s
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Install flyctl
+        run: |
+          curl -L https://fly.io/install.sh | sh
+          echo "/root/.fly/bin" >> "$GITHUB_PATH"
+
+      - name: Deploy to Fly.io
+        env:
+          FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
+        run: |
+          cd fly
+          fly deploy
+
+      - name: Verify health
+        env:
+          FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
+        run: |
+          fly status -a blumeops-proxy
+          echo ""
+          echo "Health check:"
+          sleep 10
+          curl -sf https://blumeops-proxy.fly.dev/healthz || echo "Warning: health check failed (may need DNS propagation)"
--- a/.forgejo/workflows/test.yaml
+++ b/.forgejo/workflows/test.yaml
@ -1,44 +0,0 @@
-name: Test CI
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-  workflow_dispatch:
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-
-      - name: Verify tools
-        run: |
-          echo "=== Node.js ==="
-          node --version
-          npm --version
-          echo ""
-          echo "=== Git ==="
-          git --version
-          echo ""
-          echo "=== Build tools ==="
-          make --version | head -1
-          gcc --version | head -1
-          echo ""
-          echo "=== Docker ==="
-          docker --version
-          echo ""
-          echo "=== Other tools ==="
-          curl --version | head -1
-          jq --version
-
-      - name: Show repo info
-        run: |
-          echo "Repository: ${{ github.repository }}"
-          echo "Event: ${{ github.event_name }}"
-          echo "Ref: ${{ github.ref }}"
-          echo "Branch: ${{ github.ref_name }}"
-          echo ""
-          echo "=== Files ==="
-          ls -la
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1 @@
+/sdk/** linguist-generated
--- a/.github/USE_FORGE_WORKFLOWS.md
+++ b/.github/USE_FORGE_WORKFLOWS.md
@ -0,0 +1,9 @@
+# .github directory
+
+This directory contains configuration for GitHub-ecosystem tooling only.
+
+**Workflows and actions belong in `.forgejo/`** - this repository uses Forgejo Actions, not GitHub Actions.
+
+## Contents
+
+- `actionlint.yaml` - Configuration for actionlint prek hook (custom runner labels)
--- a/.github/actionlint.yaml
+++ b/.github/actionlint.yaml
@ -0,0 +1,4 @@
+self-hosted-runner:
+  labels:
+    - k8s
+    - nix-container-builder
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,6 @@
 .claude/settings.local.json
+.claude/agent-memory/
+.claude/scheduled_tasks.lock

 # Python
 __pycache__/
@ -6,5 +8,10 @@ __pycache__/
 *.pyo
 .venv/

+# Dagger (auto-generated SDK)
+/sdk/
+
 # OS
 .DS_Store
+/**/__pycache__
+/.env
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -1,89 +0,0 @@
---
-# See https://pre-commit.com for more information
-# Run: uvx pre-commit run --all-files
-# Install: uvx pre-commit install
-
-repos:
-  # General file hygiene
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v6.0.0
-    hooks:
-      - id: trailing-whitespace
-      - id: end-of-file-fixer
-      - id: check-added-large-files
-        args: ['--maxkb=1000']
-      - id: check-merge-conflict
-      - id: check-json
-      - id: check-yaml
-        args: ['--unsafe']  # Allow custom tags (ansible uses them)
-      - id: check-toml
-
-  # Secret detection
-  - repo: https://github.com/trufflesecurity/trufflehog
-    rev: v3.92.5
-    hooks:
-      - id: trufflehog
-        entry: trufflehog git file://. --no-verification --fail
-        stages: [pre-commit, pre-push]
-
-  # YAML linting
-  - repo: https://github.com/adrienverge/yamllint
-    rev: v1.38.0
-    hooks:
-      - id: yamllint
-        args: ['-c', '.yamllint.yaml']
-
-  # Ansible linting
-  - repo: local
-    hooks:
-      - id: ansible-lint
-        name: ansible-lint
-        entry: env ANSIBLE_ROLES_PATH=ansible/roles ansible-lint
-        language: python
-        files: ^ansible/
-        additional_dependencies:
-          - ansible-lint>=26.1.1
-          - ansible-core>=2.15
-
-  # Python - ruff for linting and formatting
-  - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.14.13
-    hooks:
-      - id: ruff
-        args: ['--fix']
-      - id: ruff-format
-
-  # Shell scripts - shellcheck and shfmt
-  - repo: https://github.com/shellcheck-py/shellcheck-py
-    rev: v0.10.0.1
-    hooks:
-      - id: shellcheck
-        args: ['--severity=warning']
-
-  - repo: https://github.com/scop/pre-commit-shfmt
-    rev: v3.12.0-2
-    hooks:
-      - id: shfmt
-        args: ['-i', '2', '-ci', '-bn']  # 2-space indent, case indent, binary newline
-
-  # TOML - taplo
-  - repo: https://github.com/ComPWA/taplo-pre-commit
-    rev: v0.9.3
-    hooks:
-      - id: taplo-format
-      - id: taplo-lint
-
-  # JSON formatting (prettier for consistent style)
-  - repo: https://github.com/rbubley/mirrors-prettier
-    rev: v3.8.0
-    hooks:
-      - id: prettier
-        types_or: [json]
-        args: ['--tab-width', '2']
-
-  # GitHub/Forgejo Actions workflow linting
-  - repo: https://github.com/rhysd/actionlint
-    rev: v1.7.10
-    hooks:
-      - id: actionlint-system
-        files: ^\.forgejo/workflows/
--- a/.yamllint.yaml
+++ b/.yamllint.yaml
@ -21,11 +21,11 @@ rules:
  # Required for ansible-lint compatibility
  comments-indentation: false
  octal-values:
-    forbid-implicit-octal: true
+    forbid-implicit-octal: false
    forbid-explicit-octal: true

 ignore:
  - .venv/
  - pulumi/.venv/
  # Third-party k8s manifest with non-standard formatting
-  - argocd/manifests/tailscale-operator/operator.yaml
+  - argocd/manifests/tailscale-operator-base/operator.yaml
--- a/AGENTS.md
+++ b/AGENTS.md
@ -0,0 +1,171 @@
+# AGENTS.md
+
+Guidance for AI agents working in this repository. See also [[ai-assistance-guide]].
+
+## Overview
+
+blumeops is Erich Blume's GitOps repository for personal infrastructure, orchestrated via tailnet `tail8d86e.ts.net`.
+
+**CRITICAL: Public repo at github.com/eblume/blumeops - never commit secrets!**
+
+**Shell:** The user's interactive shell may differ from the current harness shell. Prefer repo-safe, non-interactive commands when possible, and match the user's shell conventions when giving interactive examples.
+
+## Rules
+
+1. **Always run `mise run ai-docs` at session start**
+    This will refresh your context with important information you will be assumed to know and follow.
+    **Read the full output** — never truncate, pipe to `head`/`tail`, or skip sections.
+    For problems with a large surface area, ask the user if `mise run ai-sources` should also be run — it concatenates all non-doc source files (~270K tokens) for deep codebase context.
+2. **Always use `--context=minikube-indri` with kubectl** (or `--context=k3s-ringtail` for ringtail services) - work contexts must never be touched
+    **NEVER run `minikube delete`** — it destroys all PVs, etcd, and cluster state. Use `minikube stop`/`minikube start` for restarts. If minikube is stuck, see [[restart-indri]]. Full rebuild from scratch requires the DR procedure in [[rebuild-minikube-cluster]].
+3. **Classify the change as C0/C1/C2 before starting** (see below) — this determines branching and PR requirements
+4. **Feature branches + PRs for C1/C2** - checkout main, pull, create branch, open PR via `tea pr create`. C0 goes direct to main.
+5. **Check PR comments with `mise run pr-comments <pr_number>`** before proceeding
+6. **Add changelog fragments (all change levels)** - `docs/changelog.d/<name>.<type>.md`
+    Types: `feature`, `bugfix`, `infra`, `doc`, `ai`, `misc`
+    Applies to C0, C1, and C2 whenever the change is user-visible or noteworthy.
+    - **C1/C2:** Use branch name: `<branch>.<type>.md`
+    - **C0:** Use orphan prefix: `+<descriptive-slug>.<type>.md` (avoids `main.*` collisions)
+7. **Test before applying** - dry runs (`--check --diff`), syntax checks, `ssh indri '...'`
+8. **Wait for user review before deploying** (C1/C2)
+9. **Never merge PRs or push to main without explicit request** (C0 commits to main are fine)
+10. **Verify deployments** - `mise run services-check`
+
+## Change Classification
+
+Before starting work, classify the change:
+
+| Class | Name | When to use | Key trait |
+|-------|------|-------------|-----------|
+| **C0** | Quick Fix | Small, low-risk, fix-forward safe | Direct to main, no PR |
+| **C1** | Human Review | Moderate complexity or risk | Feature branch + PR, docs-first |
+| **C2** | Mikado Chain | Multi-phase, multi-session, high complexity | Mikado Branch Invariant |
+
+**C0** — commit directly to main. No branch or PR needed. Fix forward if problems arise.
+
+**C1** — feature branch with early PR. Search related docs first, write documentation changes before code, deploy from the unmerged branch (ArgoCD `--revision`, Ansible from checkout). Upgrade to C2 if complexity spirals.
+
+**C2** — branch `mikado/<chain-stem>` governed by the Mikado Branch Invariant: all card commits first, then code progress, then card closures. Commits use `C2(<chain>): plan/impl/close/finalize` convention. Reset the branch when new prerequisites are discovered. Resume with `mise run docs-mikado --resume`.
+
+See [[agent-change-process]] for the full methodology.
+
+## Project Structure
+
+```
+./docs/                 # documentation (Diataxis, Quartz)
+./docs/changelog.d/     # towncrier fragments
+./.dagger/              # dagger pipelines
+./.forgejo/             # forgejo-runner actions and workflows
+./mise-tasks/           # scripts via `mise run`
+./ansible/playbooks/    # ansible (indri.yml primary)
+./ansible/roles/        # indri service roles
+./argocd/apps/          # ArgoCD Application definitions
+./argocd/manifests/     # k8s manifests per service
+./fly/                  # fly.io proxy for public routing
+./pulumi/               # Pulumi IaC (tailnet ACLs, dns, cloud)
+~/.config/{nvim,fish}   # user's shell config, managed by chezmoi
+~/code/personal/        # user's projects
+~/code/personal/zk      # user's zettelkasten (Obsidian-sync). Reference-data source; migrating into heph docs (hephaestus).
+~/code/3rd/             # mirrored external projects
+~/code/work             # FORBIDDEN
+```
+Other code paths will be listed via ai-docs, this is just an overview. When you
+encounter wiki-links (`[[like-this]]`) it is referring to docs/ cards.
+
+## Service Deployment
+
+### Kubernetes (ArgoCD)
+
+Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync). GPU workloads (Frigate, ntfy) run on ringtail's k3s cluster, also managed by ArgoCD.
+
+**PR workflow:**
+1. Create branch, modify `argocd/manifests/<service>/`
+2. Push. Sync 'apps' app if service definition changed (set --revision to branch).
+3. Test on branch: `argocd app set <service> --revision <branch> && argocd app sync <service>`
+4. After merge: `argocd app set <service> --revision main && argocd app sync <service>`
+
+**Commands:** `argocd app list|get|diff|sync <app>`
+
+**Login:** `argocd login argocd.ops.eblu.me --sso` (opens browser for Authentik SSO). Admin fallback for break-glass: `argocd login argocd.ops.eblu.me --username admin --password "$(op read 'op://vg6xf6vvfmoh5hqjjhlhbeoaie/srogeebssulhtb6tnqd7ls6qey/password')"`
+
+### Indri (Ansible)
+
+Native services: Forgejo, Zot, Caddy, Borgmatic, Alloy
+
+```fish
+mise run provision-indri                    # full
+mise run provision-indri -- --tags <role>   # specific
+mise run provision-indri -- --check --diff  # dry run
+```
+
+### Routing
+
+| Domain | Mechanism | Reachable from |
+|--------|-----------|----------------|
+| `*.eblu.me` | Fly.io proxy (Tailscale tunnel) | public internet |
+| `*.ops.eblu.me` | Caddy on indri | k8s pods, containers, tailnet |
+| `*.tail8d86e.ts.net` | Tailscale MagicDNS | tailnet clients only |
+
+Check tailscale serve: `ssh indri 'tailscale serve status --json'`
+
+## Container Releases
+
+```fish
+mise run container-list                       # show images/tags
+mise run container-release <name> <version>   # tag and build
+```
+The goal is to eventually use only locally built containers in all cases, with
+full supply chain control via forge.ops.eblu.me repositories, mirroring source
+from upstream.
+
+**After triggering a build** (manual dispatch or push to main), verify the
+workflow succeeded before proceeding:
+
+```fish
+mise run runner-logs                          # find the run number
+mise run runner-logs <run#>                   # see jobs in the run
+mise run runner-logs <run#> -j <N>            # fetch logs on failure
+```
+
+This also works for other forge repos (`--repo eblume/hermes`).
+
+## Third-Party Projects
+
+Ask user to mirror on forge first, then clone to `~/code/3rd/<project>/`.
+
+### Sporked Projects
+
+Some mirrored projects are "sporked" — a floating-branch soft-fork strategy
+where local patches are continuously rebased on top of upstream. See
+[[spork-strategy]] and [[create-a-spork]] for the full methodology.
+
+Sporked projects live in `~/code/3rd/<project>/` with three remotes:
+`origin` (eblume/ fork on forge), `mirror` (mirrors/ on forge), `upstream`
+(canonical). The `blumeops` branch is the default; `deploy` merges everything.
+
+Create a new spork: `mise run spork-create <mirror-name>`
+
+## Task Discovery
+
+BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`),
+the user's self-hosted context/task system. Fetch them with the CLI:
+
+```fish
+heph list --project Blumeops --json  # outstanding Blumeops tasks as JSON
+```
+
+(This replaced the retired `blumeops-tasks` mise task, which read from Todoist.)
+
+Most operational scripts are stored in `./mise-tasks/`. For scripts with any logic or
+complexity, use uv run --script 's with explicit dependencies. Complex
+workflows with artifacts should become dagger pipelines. Mise tasks are for
+development processes and operations - tools for the user or the agent.
+
+## Credentials
+
+Root store is 1Password. Never grab directly - use existing patterns (ansible
+pre_tasks, external-secrets, scripts with `op` CLI). It's ok to use `op item
+get` without `--reveal` to explore what secrets are available, however.
+
+Prefer `op read "op://vault/item/field"` over `op item get --fields` to avoid
+quoting issues with multi-line values.
--- a/5
+++ b/5
@ -1,6 +1,9 @@
 # CLI tools for blumeops management
 brew "actionlint"  # GitHub/Forgejo Actions workflow linter
+brew "age"  # File encryption for 1Password backup (op-backup)
 brew "argocd"  # ArgoCD CLI for GitOps management
 brew "bat"  # Syntax-highlighted file concatenation
-brew "tea"  # Gitea/Forgejo CLI for forge.tail8d86e.ts.net
+brew "mise"  # Task runner and toolchain manager
+brew "tea"  # Gitea/Forgejo CLI for forge.ops.eblu.me
+brew "flyctl"  # Fly.io CLI for public proxy management
 brew "podman"  # Container CLI (uses VM on macOS, for building/pushing images)
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -1,157 +1 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Project Overview
-
-blumeops is Erich Blume's GitOps repository for personal infrastructure management, orchestrated via tailnet `tail8d86e.ts.net`.
-
-**Critical: This repository is published publicly at https://github.com/eblume/blumeops, so never include any secrets!**
-
-## Rules
-
-1. At the start of every session, even if the user asked to do something else, run `mise run zk-docs -- --style=header --color=never --decorations=always` in order to review the `blumeops` documentation in the zettelkasten (zk). zk lives at `~/code/personal/zk`, and is managed via obsidian-sync (not git).
-
-2. When making any changes, start by making sure you're on the `main` git branch and up-to-date, and then create a feature branch. Commit often while working, and create a PR using:
-```fish
-tea pr create --title "Description of change" --description "$(cat <<'EOF'
-## Summary
- First change
- Second change
-
-## Deployment and Testing
- [x] Done thing one
- [ ] Needed thing two
-
-🤖 Generated with [Claude Code](https://claude.com/claude-code)
-EOF
-)"
-```
-The user will review your work as you go, and will merge the PR as the last step in the process, even after deploying. After the user reviews the PR and leaves comments, check for unresolved comments with:
-```fish
-mise run pr-comments <pr_number>
-```
-Address each unresolved comment before proceeding. The user will resolve comments on the Forge UI as they are addressed.
-
-3. Always keep the zk cards up to date with any changes, and suggest new links to new cards whenever appropriate. Refer back to the zk docs often during the process of planning and making corrections to ensure accuracy, and if you make a mistake, figure out a way to guard against it using the zk.
-
-4. Use `Brewfile` and `mise.toml` to install tools needed on the development workstation (typically hostnamed "gilbert", username "eblume").
-
-5. Services are hosted either on indri directly (via ansible) or in Kubernetes (via ArgoCD). See the "Service Deployment" section below for details.
-
-6. Try to always test changes before applying them. Use syntax checkers, do dry runs (`--check --diff`), run commands manually via `ssh indri 'some command'`, etc.
-
-7. **Wait for user review before deploying.** After creating a PR, do not run deployment commands until the user has had a chance to review the changes. The user will indicate when they're ready to deploy.
-
-8. After deploying changes, try to verify the result. Use `mise run indri-services-check` to do a general service health check.
-
-## Project Structure
-
-```
-./mise-tasks/           # management and utility scripts run via `mise run`
-./ansible/playbooks/    # ansible playbooks (indri.yml is primary)
-./ansible/roles/        # ansible roles for indri-hosted services
-./argocd/apps/          # ArgoCD Application definitions (app-of-apps pattern)
-./argocd/manifests/     # Kubernetes manifests for each service
-./pulumi/               # Pulumi IaC for tailnet ACLs and cloud resources
-./plans/                # Migration and project planning documents
-~/code/personal/        # projects managed by the user
-~/code/3rd/             # external projects, mirrored or downloaded
-~/code/work             # FORBIDDEN, never go here, avoid searching it
-```
-
-## Service Deployment
-
-### Kubernetes Services (via ArgoCD)
-
-Most services run on `k8s.tail8d86e.ts.net`, via minikube on indri. They are managed via ArgoCD using the app-of-apps pattern:
-
- **Application definitions**: `argocd/apps/<service>.yaml`
- **Manifests**: `argocd/manifests/<service>/`
- **Sync policy**: Manual sync (no auto-sync on git push)
-
-**PR workflow for k8s services:**
-
-1. Create feature branch and add/modify manifests
-2. Push branch to forge
-3. Sync the `apps` application to pick up new Application definitions:
-   ```fish
-   argocd app sync apps
-   ```
-4. Point the service app at the feature branch for testing:
-   ```fish
-   argocd app set <service> --revision feature/branch-name
-   argocd app sync <service>
-   ```
-5. Test the deployment
-6. After PR merge, reset to main and resync:
-   ```fish
-   argocd app set <service> --revision main
-   argocd app sync <service>
-   ```
-
-**Useful commands:**
-```fish
-argocd app list                                        # List all apps
-argocd app get <app>                                   # Get app details
-argocd app diff <app>                                  # Preview changes before sync
-argocd app sync <app>                                  # Sync an app
-kubectl --context=minikube-indri get pods -n <namespace>  # Check pods
-kubectl --context=minikube-indri logs -n <namespace> <pod>  # View logs
-```
-
-Note: The user has fish abbreviations `ki` for `kubectl --context=minikube-indri` and `k9i` for `k9s --context=minikube-indri`, but these only work in interactive shells.
-
-**ArgoCD login (when token expires):**
-```fish
-argocd login argocd.tail8d86e.ts.net --username admin --password "$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get srogeebssulhtb6tnqd7ls6qey --fields password --reveal)"
-```
-
-### Indri Services (via Ansible)
-
-Some services remain on indri outside of Kubernetes:
- **Zot Registry** - Container registry (k8s depends on it)
- **Prometheus/Loki** - Observability (must survive k8s failures)
- **Borgmatic** - Backup system
- **Grafana Alloy** - Metrics/logs collector
- **Transmission** - BitTorrent for kiwix downloads
-
-**Deployment:**
-```fish
-mise run provision-indri                    # Full playbook
-mise run provision-indri -- --tags <role>   # Specific role
-mise run provision-indri -- --check --diff  # Dry run
-```
-
-### Tailscale Service Hostnames
-
-When migrating a service from indri to k8s, the Tailscale hostname must be freed:
-
-1. Stop the service on indri
-2. Clear the tailscale serve entry: `ssh indri 'tailscale serve clear svc:<name>'`
-3. Delete the device from Tailscale admin console (user action required)
-4. Deploy the k8s Ingress - it will claim the hostname
-
-Use `ssh indri 'tailscale serve status --json'` to check current serve entries (the non-JSON output may be empty even when entries exist).
-
-## Third-Party Projects
-
-When a task requires cloning or using a third-party git repository (e.g., for building from source), **ask the user to mirror it on forge first**, then clone from the mirror:
- Mirror location: `https://forge.tail8d86e.ts.net/eblume/<project>.git`
- Clone to: `~/code/3rd/<project>/`
-
-This avoids external dependencies and ensures the project is available even if the upstream is unreachable.
-
-## Task Discovery
-
-To discover pending blumeops tasks, run:
-
-```fish
-mise run blumeops-tasks
-```
-
-This fetches tasks from the "Blumeops" project in Todoist (via 1Password for API credentials) and displays them sorted by priority: p1 (urgent), p2 (high), p4 (normal/default), p3 (backlog). The typical workflow is to pick a task from this list at the start of a session, then dive in with planning.
-
-## Credentials
-
-The root store for credentials is 1password, which can be accessed via `op --vault <vaultid> item get <itemid> --field fieldname --reveal`, which will prompt the user for their assent and biometrics or password. Typically, use scripts to defer this action - try not to ever grab credentials directly. For instance, the indri.yml playbook starts with `pre_tasks` to gather the relevant secrets needed to provision its services. Some services have their credentials exported to files `chmod 0600` on indri, but they still start out in 1password. In some cases you can test services with a command that grabs the credential, but try to use environment variables or other arrangements to avoid learning the credential yourself, and warn the user first.
+@AGENTS.md
--- a/674
+++ b/674
@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<https://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<https://www.gnu.org/licenses/why-not-lgpl.html>.
--- a/README.md
+++ b/README.md
@ -1,89 +1,95 @@
 # blumeops
+aka "Blue Mops"
+
+Tools and configuration for Erich Blume's personal infrastructure, orchestrated
+across a Tailscale tailnet.
+
+This is a homelab, but it's also a testing ground for AI-assisted
+infrastructure development. Much of this codebase was initially co-authored with [Claude
+Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview),
+and the repo places heavy emphasis on documentation, process, and change
+classification to make that collaboration work well. I don't know entirely how
+I feel about LLMs in our current era (there are real concerns about how
+training data is sourced and energy subsidy) but it felt important to learn how
+to work with these tools.
+
+The full documentation is published at **[docs.eblu.me](https://docs.eblu.me)**
+and lives in the [`docs/`](docs/) directory, structured around the
+[Diataxis](https://diataxis.fr/) framework and designed to be compatible with
+[Obsidian](https://obsidian,nd)/[Obsidian.nvim](https://github.com/obsidian-nvim/obsidian.nvim).
+
+## What runs here
+
+Services are a mix of Kubernetes pods (managed by ArgoCD), macOS LaunchAgent
+services (managed by Ansible), and NixOS systemd services (managed by Nix
+flakes), all connected via Tailscale:
+
+- **Indri** (Mac Mini M1) - primary server. Most services run in Minikube via
+  ArgoCD; Forgejo, Caddy, and others run natively as LaunchAgent services via
+  Ansible.
+- **Ringtail** (NixOS desktop, RTX 4080) - GPU workloads (Frigate NVR,
+  Authentik SSO) on k3s, plus NixOS systemd services.
+- **Sifaka** (Synology NAS) - backup target and bulk storage.
+
+Notable services include Grafana/Prometheus/Loki observability, Immich photos,
+Jellyfin media, Forgejo git forge, a Zot container registry, and more. Public
+access is routed through a Fly.io proxy; everything else is tailnet-only.
+
+## Project structure

 ```
-                    l0K                                k..:k.
-                  .:...c.                            ;c....
-                    ....'o                          x.....
-                      ....k                        x....
-                       ... l'                    'c....
-                        ....,l                  o'....
-                         .....x                k....
-                          .....d.             c....
-                            ... l            x....
-                              .,.d         ;c.c'
-                               'c':;      x',c.
-                                .:,'o   .x.::.
-                                 .;:.k ,:.c'
-                                   ,c.c';:.
-                                    .,.:;.
-                                   ;'.c, l
-                                  d',c..:.d.
-                                 O.:;.  'c';c
-                               ;c.c'     .:;.x
-                              o',c.       .;:.k
-                             x.::.          'c.l.
-                         dOKl.c,             .c,'o
-                   0l'...... ..'              .::.ocx.
-                 'o ............              o .... :olx;
-                x,ox;. ....... .k             ....,dKKo;..x
-              'd,OXXXXk:. ...... ;            ;:dXOl;',';l;o;
-             x,oXXXXXXXXXkc. ...              .lc,',':dKNNNx;x;
-           ;o;0KXXXXXXXXXXXX0l.                .',ckNNNNNNNNNxco0d
-          l,d0oOXKOKXXXXKXXXX0.                  kNNNNNNNNNNNNNXxloo::
-             .OXxdXKOX0kXXXX0.                   .KNNNNNNNNNNXONX0o.
-                ,OdxKldXXXXx.                     ,NNNNNNNNNNNKoc
-                   :.OXXkKo                       .kNNNNNNNNXx.
-                      ':0c                         .NdNkXkc
+ansible/            Ansible playbooks and roles (indri, sifaka)
+argocd/apps/        ArgoCD Application definitions
+argocd/manifests/   Kubernetes manifests per service
+containers/         Custom container builds (Dockerfile + Nix)
+docs/               Diataxis documentation (published at docs.eblu.me)
+fly/                Fly.io public proxy configuration
+mise-tasks/         Operational scripts run via mise
+nixos/              NixOS configuration for ringtail
+pulumi/             Pulumi IaC (Tailscale ACLs, Gandi DNS)
+.dagger/            Dagger CI pipelines
+.forgejo/           Forgejo Actions CI/CD workflows
 ```

-*Blue Mops* — GitOps for Erich Blume's personal computing environment.
+## Getting started

-## What is this?
-
-Infrastructure-as-code for my tailnet (`tail8d86e.ts.net`). This repo contains
-ansible playbooks, configuration, and automation for managing my personal
-infrastructure.
-
-This codebase was heavily co-authored by Claude Code, as an experiment in
-LLM-assisted development. I want to include a personal note here that I don't
-know entirely how I feel about LLMs in our current era, but it felt important
-to learn.
-
-## Development
-
-### Pre-commit Hooks
-
-This repo uses [pre-commit](https://pre-commit.com) for code quality and consistency. Install hooks with:
+You'll need [Homebrew](https://brew.sh) and [mise](https://mise.jdx.dev):

 ```bash
-uvx pre-commit install
+brew bundle                    # install CLI tools (argocd, tea, flyctl, etc.)
+mise install                   # install managed toolchains (ansible, pulumi, dagger, etc.)
+prek install                    # set up git hooks
 ```

-Run all hooks manually:
+Git hooks (via [prek](https://github.com/j178/prek)) enforce secret scanning
+(TruffleHog), linting, formatting, and custom checks like doc link validation
+and the Mikado branch invariant. They run automatically on `git commit`.
+
+Operational tasks are driven through mise. Run `mise tasks` to see what's
+available. Key examples:

 ```bash
-uvx pre-commit run --all-files
+mise run provision-indri       # deploy to indri via Ansible
+mise run services-check        # verify service health
+mise run container-list        # list tracked container images
 ```

-Hooks include:
- **General**: trailing whitespace, end-of-file fixer, large files, merge conflicts
- **Secrets**: [TruffleHog](https://github.com/trufflesecurity/trufflehog) for secret detection
- **YAML**: yamllint, ansible-lint
- **Python**: ruff (linting + formatting)
- **Shell**: shellcheck, shfmt
- **TOML**: taplo
- **JSON**: prettier
+## AI-assisted development

-## CI/CD
+This repo is designed to be worked on by both humans and AI agents. The
+[`AGENTS.md`](AGENTS.md) file provides shared instructions for agentic tools, and the
+[`docs/tutorials/ai-assistance-guide.md`](docs/tutorials/ai-assistance-guide.md)
+explains the full workflow.

-This repo uses [Forgejo Actions](https://forgejo.org/docs/latest/user/actions/) for CI/CD. Workflows live in `.forgejo/workflows/` (not `.github/workflows/`). The runner executes jobs in host mode within the Kubernetes cluster.
+Changes are classified before starting work:

-## Documentation
+- **C0** - quick fixes, committed directly to main
+- **C1** - feature branch + PR, documentation written before code
+- **C2** - multi-phase work using the Mikado method for dependency tracking

-Detailed documentation lives in my personal zettelkasten, which is not included in this repository. You can view the docs with:
+See the [agent change process](docs/explanation/agent-change-process.md) for
+details.

-```bash
-mise run zk-docs
-```
+## License

-The zettelkasten is private at time of writing. If you're interested in the documentation or have questions about this project, please reach out to blume.erich@gmail.com.
+[GPLv3](LICENSE)
--- a/ansible/group_vars/all.yml
+++ b/ansible/group_vars/all.yml
@ -1,2 +0,0 @@
---
-ansible_managed: "Managed by ansible - do not edit. Source: ssh://forgejo@forge.tail8d86e.ts.net/eblume/blumeops.git"
--- a/ansible/inventory/group_vars/all.yml
+++ b/ansible/inventory/group_vars/all.yml
@ -0,0 +1,6 @@
+---
+ansible_managed: "Managed by ansible - do not edit. Source: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git"
+
+# Sifaka NAS exporter ports — shared by caddy (indri) and sifaka_exporters roles
+sifaka_node_exporter_port: 9100
+sifaka_smartctl_exporter_port: 9633
--- a/ansible/inventory/host_vars/sifaka.yml
+++ b/ansible/inventory/host_vars/sifaka.yml
@ -0,0 +1,3 @@
+---
+ansible_user: eblume
+ansible_python_interpreter: /usr/bin/python3
--- a/ansible/inventory/hosts.yml
+++ b/ansible/inventory/hosts.yml
@ -5,6 +5,9 @@ all:
      hosts:
        indri:
          ansible_host: indri
+        ringtail:
+          ansible_host: ringtail
+          ansible_user: eblume
    workstations:
      hosts:
        gilbert:
--- a/ansible/playbooks/indri.yml
+++ b/ansible/playbooks/indri.yml
@ -8,7 +8,7 @@
  pre_tasks:
    - name: Fetch borgmatic database password
      ansible.builtin.command:
-        cmd: op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get mw2bv5we7woicjza7hc6s44yvy --fields db-password --reveal
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/mw2bv5we7woicjza7hc6s44yvy/db-password"
      delegate_to: localhost
      register: _borgmatic_db_pw
      changed_when: false
@ -22,10 +22,26 @@
      no_log: true
      tags: [borgmatic]

+    - name: Fetch BorgBase SSH private key
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/noiobufntsxyzageu7mvlp2nbe/ssh-private-key"
+      delegate_to: localhost
+      register: _borgbase_ssh_key
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [borgmatic]
+
+    - name: Set BorgBase SSH key fact
+      ansible.builtin.set_fact:
+        borgbase_ssh_private_key: "{{ _borgbase_ssh_key.stdout }}"
+      no_log: true
+      tags: [borgmatic]
+
    # Forgejo secrets
    - name: Fetch forgejo LFS JWT secret
      ansible.builtin.command:
-        cmd: op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get w3663ffnvkewbftncqxtcpeavy --fields lfs-jwt-secret --reveal
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/lfs-jwt-secret"
      delegate_to: localhost
      register: _forgejo_lfs_jwt
      changed_when: false
@ -35,7 +51,7 @@

    - name: Fetch forgejo internal token
      ansible.builtin.command:
-        cmd: op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get w3663ffnvkewbftncqxtcpeavy --fields internal-token --reveal
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/internal-token"
      delegate_to: localhost
      register: _forgejo_internal_token
      changed_when: false
@ -45,7 +61,7 @@

    - name: Fetch forgejo OAuth2 JWT secret
      ansible.builtin.command:
-        cmd: op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get w3663ffnvkewbftncqxtcpeavy --fields oauth2-jwt-secret --reveal
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/oauth2-jwt-secret"
      delegate_to: localhost
      register: _forgejo_oauth2_jwt
      changed_when: false
@ -61,6 +77,158 @@
      no_log: true
      tags: [forgejo]

+    # Forgejo Actions secrets (synced to Forgejo via API)
+    - name: Fetch Forgejo API token
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/api-token"
+      delegate_to: localhost
+      register: _forgejo_api_token
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [forgejo_actions_secrets]
+
+    - name: Fetch ArgoCD auth token for Forgejo Actions
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/argocd_token"
+      delegate_to: localhost
+      register: _forgejo_argocd_token
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [forgejo_actions_secrets]
+
+    - name: Fetch Fly.io deploy token for Forgejo Actions
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/on5slfaygtdjrxmdwezyhfmqsq/deploy-token"
+      delegate_to: localhost
+      register: _fly_deploy_token
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [forgejo_actions_secrets]
+
+    - name: Fetch Zot CI API key for Forgejo Actions
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/zot-ci-api"
+      delegate_to: localhost
+      register: _zot_ci_api_key
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [forgejo_actions_secrets]
+
+    - name: Set Forgejo Actions secrets facts
+      ansible.builtin.set_fact:
+        forgejo_api_token: "{{ _forgejo_api_token.stdout }}"
+        forgejo_secret_argocd_token: "{{ _forgejo_argocd_token.stdout }}"
+        forgejo_secret_fly_deploy_token: "{{ _fly_deploy_token.stdout }}"
+        forgejo_secret_zot_ci_api_key: "{{ _zot_ci_api_key.stdout }}"
+      no_log: true
+      tags: [forgejo_actions_secrets]
+
+    # Zot OIDC client secret
+    - name: Fetch zot OIDC client secret
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/oor7os5kapczgpbwv7obkca4y4/zot-client-secret"
+      delegate_to: localhost
+      register: _zot_oidc_secret
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [zot]
+
+    - name: Set zot OIDC client secret fact
+      ansible.builtin.set_fact:
+        zot_oidc_client_secret: "{{ _zot_oidc_secret.stdout }}"
+      no_log: true
+      tags: [zot]
+
+    # Caddy Gandi token for ACME DNS-01 challenges
+    - name: Fetch Gandi PAT for Caddy
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/mco6ka3dc3rmw7zkg2dhia5d2m/pat"
+      delegate_to: localhost
+      register: _caddy_gandi_token
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [caddy]
+
+    - name: Set Caddy Gandi token fact
+      ansible.builtin.set_fact:
+        caddy_gandi_token: "{{ _caddy_gandi_token.stdout }}"
+      no_log: true
+      tags: [caddy]
+
+    # Jellyfin SSO client secret
+    - name: Fetch Jellyfin OIDC client secret
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/oor7os5kapczgpbwv7obkca4y4/jellyfin-client-secret"
+      delegate_to: localhost
+      register: _jellyfin_oidc_secret
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [jellyfin]
+
+    - name: Set Jellyfin OIDC client secret fact
+      ansible.builtin.set_fact:
+        jellyfin_sso_client_secret: "{{ _jellyfin_oidc_secret.stdout }}"
+      no_log: true
+      tags: [jellyfin]
+
+    # Jellyfin API key for metrics collection
+    - name: Fetch Jellyfin API key
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/ceywxkcd3z7najsy2nmmbs2vke/credential"
+      delegate_to: localhost
+      register: _jellyfin_metrics_api_key
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [jellyfin_metrics]
+
+    - name: Set Jellyfin API key fact
+      ansible.builtin.set_fact:
+        jellyfin_metrics_api_key: "{{ _jellyfin_metrics_api_key.stdout }}"
+      no_log: true
+      tags: [jellyfin_metrics]
+
+    # Forgejo API token for metrics collection
+    - name: Fetch Forgejo API token for metrics
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/api-token"
+      delegate_to: localhost
+      register: _forgejo_metrics_api_token
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [forgejo_metrics]
+
+    - name: Set Forgejo metrics API token fact
+      ansible.builtin.set_fact:
+        forgejo_metrics_api_key: "{{ _forgejo_metrics_api_token.stdout }}"
+      no_log: true
+      tags: [forgejo_metrics]
+
+    # Devpi root password (PyPI mirror admin)
+    - name: Fetch devpi root password
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/kyhzfifryqnuk7jeyibmmjvxxm/add more/root password"
+      delegate_to: localhost
+      register: _devpi_root_password
+      changed_when: false
+      no_log: true
+      check_mode: false
+      tags: [devpi]
+
+    - name: Set devpi root password fact
+      ansible.builtin.set_fact:
+        devpi_root_password: "{{ _devpi_root_password.stdout }}"
+      no_log: true
+      tags: [devpi]
+
  roles:
    - role: alloy
      tags: alloy
@ -70,15 +238,29 @@
      tags: borgmatic_metrics
    - role: forgejo
      tags: forgejo
+    - role: forgejo_actions_secrets
+      tags: forgejo_actions_secrets
    - role: zot
      tags: zot
    - role: zot_metrics
      tags: zot_metrics
+    - role: devpi
+      tags: devpi
    - role: minikube
      tags: minikube
    - role: minikube_metrics
      tags: minikube_metrics
-    - role: plex_metrics
-      tags: plex_metrics
-    - role: tailscale_serve
-      tags: tailscale-serve
+    - role: jellyfin
+      tags: jellyfin
+    - role: jellyfin_metrics
+      tags: jellyfin_metrics
+    - role: forgejo_metrics
+      tags: forgejo_metrics
+    - role: cv
+      tags: cv
+    - role: docs
+      tags: docs
+    - role: heph
+      tags: heph
+    - role: caddy
+      tags: caddy
--- a/ansible/playbooks/ringtail.yml
+++ b/ansible/playbooks/ringtail.yml
@ -0,0 +1,118 @@
+---
+- name: Configure ringtail (NixOS)
+  hosts: ringtail
+  become: true
+
+  pre_tasks:
+    - name: Fetch 1Password Connect credentials from 1Password
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/1Password Connect/credentials-file"
+      register: _op_credentials
+      changed_when: false
+      delegate_to: localhost
+      become: false
+
+    - name: Fetch 1Password Connect token from 1Password
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/1Password Connect/token"
+      register: _op_token
+      changed_when: false
+      delegate_to: localhost
+      become: false
+
+    - name: Fetch Forgejo runner registration token from 1Password
+      ansible.builtin.command:
+        cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/Forgejo Secrets/runner_reg"
+      register: _runner_reg
+      changed_when: false
+      delegate_to: localhost
+      become: false
+
+    - name: Ensure /etc/forgejo-runner directory exists
+      ansible.builtin.file:
+        path: /etc/forgejo-runner
+        state: directory
+        mode: "0700"
+
+    - name: Write Forgejo runner token file
+      ansible.builtin.copy:
+        content: "TOKEN={{ _runner_reg.stdout }}"
+        dest: /etc/forgejo-runner/token.env
+        mode: "0600"
+      no_log: true
+
+    - name: Ensure /etc/k3s directory exists
+      ansible.builtin.file:
+        path: /etc/k3s
+        state: directory
+        mode: "0700"
+
+    - name: Generate k3s token if not present
+      ansible.builtin.copy:
+        content: "{{ lookup('ansible.builtin.password', '/dev/null', chars=['hexdigits'], length=32) }}"
+        dest: /etc/k3s/token
+        mode: "0600"
+        force: false
+
+  tasks:
+    - name: Ensure blumeops repo is present
+      ansible.builtin.git:
+        repo: "https://forge.ops.eblu.me/eblume/blumeops.git"
+        dest: /etc/blumeops
+        version: "{{ ringtail_commit | default('main') }}"
+        force: true
+      register: _repo
+
+    - name: Rebuild NixOS
+      ansible.builtin.command:
+        cmd: nixos-rebuild switch --flake /etc/blumeops/nixos/ringtail#ringtail
+      register: _rebuild
+      changed_when: "'activating the configuration' in _rebuild.stderr"
+      when: _repo.changed
+
+    - name: Verify tailscale is connected
+      ansible.builtin.command: tailscale status --self --json
+      register: _ts_status
+      changed_when: false
+      failed_when: "'Running' not in _ts_status.stdout"
+
+  post_tasks:
+    - name: Wait for k3s to be ready
+      ansible.builtin.command: k3s kubectl get nodes
+      register: _k3s_ready
+      changed_when: false
+      retries: 30
+      delay: 5
+      until: _k3s_ready.rc == 0
+
+    - name: Create 1password namespace
+      ansible.builtin.command: k3s kubectl create namespace 1password
+      register: _ns
+      changed_when: _ns.rc == 0
+      failed_when: _ns.rc != 0 and 'AlreadyExists' not in _ns.stderr
+
+    - name: Create or update op-credentials secret
+      ansible.builtin.shell:
+        cmd: |
+          set -o pipefail
+          k3s kubectl create secret generic op-credentials \
+            --namespace=1password \
+            --from-literal=1password-credentials.json='{{ _op_credentials.stdout }}' \
+            --dry-run=client -o yaml | k3s kubectl apply -f -
+        executable: /run/current-system/sw/bin/bash
+      register: _op_credentials_apply
+      changed_when: "'configured' in _op_credentials_apply.stdout or 'created' in _op_credentials_apply.stdout"
+      no_log: true
+
+    - name: Create or update onepassword-token secret
+      ansible.builtin.shell:
+        cmd: |
+          set -o pipefail
+          k3s kubectl create secret generic onepassword-token \
+            --namespace=1password \
+            --from-literal=token={{ _op_token.stdout }} \
+            --dry-run=client -o yaml | k3s kubectl apply -f -
+        executable: /run/current-system/sw/bin/bash
+      register: _op_token_apply
+      changed_when: "'configured' in _op_token_apply.stdout or 'created' in _op_token_apply.stdout"
+      no_log: true
--- a/ansible/playbooks/sifaka.yml
+++ b/ansible/playbooks/sifaka.yml
@ -0,0 +1,7 @@
+---
+- name: Configure sifaka
+  hosts: nas
+
+  roles:
+    - role: sifaka_exporters
+      tags: sifaka_exporters
--- a/ansible/roles/alloy/defaults/main.yml
+++ b/ansible/roles/alloy/defaults/main.yml
@ -10,10 +10,10 @@
 # Build on dev machine (gilbert), then copy to indri:
 #
 # 1. Clone from forge mirror:
-#    git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy
+#    git clone ssh://forgejo@forge.ops.eblu.me:2222/mirrors/alloy.git ~/code/3rd/alloy
 #
 # 2. Set up build tools via mise:
-#    cd ~/code/3rd/alloy && mise use go@1.25 node yarn
+#    cd ~/code/3rd/alloy && mise use go@1.25.7 node yarn
 #
 # 3. Build with CGO enabled (default in Makefile):
 #    cd ~/code/3rd/alloy && mise x -- make alloy
@ -21,7 +21,10 @@
 # 4. Copy binary to indri:
 #    scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy
 #
-# 5. Run ansible to deploy config and LaunchAgent
+# 5. Ad-hoc codesign on indri (SCP'd binaries get quarantined by macOS):
+#    ssh indri 'codesign --sign - --force ~/.local/bin/alloy'
+#
+# 6. Run ansible to deploy config and LaunchAgent

 # Binary and paths
 alloy_binary: /Users/erichblume/.local/bin/alloy
@ -32,11 +35,11 @@ alloy_log_dir: /Users/erichblume/Library/Logs
 # Textfile collector directory (same as node_exporter for compatibility)
 alloy_textfile_dir: /opt/homebrew/var/node_exporter/textfile

-# Prometheus remote write endpoint (k8s via Tailscale)
-alloy_prometheus_url: "https://prometheus.tail8d86e.ts.net/api/v1/write"
+# Prometheus remote write endpoint (k8s via Caddy)
+alloy_prometheus_url: "https://prometheus.ops.eblu.me/api/v1/write"

-# Loki endpoint (k8s via Tailscale)
-alloy_loki_url: "https://loki.tail8d86e.ts.net/loki/api/v1/push"
+# Loki endpoint (k8s via Caddy)
+alloy_loki_url: "https://loki.ops.eblu.me/loki/api/v1/push"

 # Instance label for metrics
 alloy_instance_label: indri
@ -72,11 +75,12 @@ alloy_mcquack_logs:
  - path: /Users/erichblume/Library/Logs/mcquack.zot.err.log
    service: zot
    stream: stderr
-
-alloy_plex_logs:
-  - path: /Users/erichblume/Library/Logs/Plex Media Server/Plex Media Server.log
-    service: plex
+  - path: /Users/erichblume/Library/Logs/mcquack.jellyfin.out.log
+    service: jellyfin
    stream: stdout
+  - path: /Users/erichblume/Library/Logs/mcquack.jellyfin.err.log
+    service: jellyfin
+    stream: stderr

 # Enable log collection (requires Loki to be running)
 alloy_collect_logs: true
@ -97,6 +101,10 @@ alloy_op_vault: vg6xf6vvfmoh5hqjjhlhbeoaie
 alloy_op_postgres_item: guxu3j7ajhjyey6xxl2ovsl2ui
 alloy_op_postgres_field: alloy-user-pw

+# Forgejo metrics collection
+alloy_collect_forgejo: true
+alloy_forgejo_port: 3001
+
 # macOS power metrics collection (via powermetrics, requires root)
 alloy_collect_power_metrics: true
 alloy_power_metrics_script: /usr/local/bin/macos-power-metrics
--- a/ansible/roles/alloy/tasks/main.yml
+++ b/ansible/roles/alloy/tasks/main.yml
@ -38,9 +38,7 @@

 - name: Fetch PostgreSQL metrics password from 1Password
  ansible.builtin.command:
-    cmd: >-
-      op --vault {{ alloy_op_vault }} item get {{ alloy_op_postgres_item }}
-      --fields {{ alloy_op_postgres_field }} --reveal
+    cmd: op read "op://{{ alloy_op_vault }}/{{ alloy_op_postgres_item }}/{{ alloy_op_postgres_field }}"
  delegate_to: localhost
  register: alloy_postgres_password_result
  changed_when: false
--- a/ansible/roles/alloy/templates/config.alloy.j2
+++ b/ansible/roles/alloy/templates/config.alloy.j2
@ -29,6 +29,11 @@ prometheus.relabel "instance" {
    target_label = "instance"
    replacement  = "{{ alloy_instance_label }}"
  }
+
+  rule {
+    target_label = "cluster"
+    replacement  = "indri"
+  }
 }

 // Push metrics to Prometheus via remote_write
@ -69,6 +74,18 @@ prometheus.scrape "zot" {
 }
 {% endif %}

+{% if alloy_collect_forgejo | default(false) %}
+// ============== FORGEJO METRICS ==============
+
+// Scrape Forgejo's native metrics endpoint
+prometheus.scrape "forgejo" {
+  targets         = [{"__address__" = "localhost:{{ alloy_forgejo_port }}"}]
+  metrics_path    = "/metrics"
+  forward_to      = [prometheus.relabel.instance.receiver]
+  scrape_interval = "{{ alloy_scrape_interval }}"
+}
+{% endif %}
+
 {% if alloy_collect_logs %}
 // ============== LOG COLLECTION ==============

@ -90,15 +107,6 @@ local.file_match "mcquack_logs" {
  ]
 }

-// Discover log files - Plex Media Server
-local.file_match "plex_logs" {
-  path_targets = [
-{% for log in alloy_plex_logs %}
-    {__path__ = "{{ log.path }}", service = "{{ log.service }}", stream = "{{ log.stream }}"},
-{% endfor %}
-  ]
-}
-
 // Read and forward brew service logs
 loki.source.file "brew_logs" {
  targets    = local.file_match.brew_logs.targets
@ -111,12 +119,6 @@ loki.source.file "mcquack_logs" {
  forward_to = [loki.relabel.add_host.receiver]
 }

-// Read and forward Plex logs
-loki.source.file "plex_logs" {
-  targets    = local.file_match.plex_logs.targets
-  forward_to = [loki.relabel.add_host.receiver]
-}
-
 // Add host label to all logs
 loki.relabel "add_host" {
  forward_to = [loki.write.loki.receiver]
@ -125,6 +127,11 @@ loki.relabel "add_host" {
    target_label = "host"
    replacement  = "{{ alloy_instance_label }}"
  }
+
+  rule {
+    target_label = "cluster"
+    replacement  = "indri"
+  }
 }

 // Write logs to Loki
--- a/ansible/roles/borgmatic/defaults/main.yml
+++ b/ansible/roles/borgmatic/defaults/main.yml
@ -6,6 +6,16 @@ borgmatic_log_dir: /Users/erichblume/Library/Logs
 # Full path to borg binary since LaunchAgent doesn't have homebrew in PATH
 borgmatic_local_path: /opt/homebrew/bin/borg

+# Borgmatic version — keep in sync with mise.toml in the repo root.
+# Ansible installs this via `mise install` so indri doesn't need the repo cloned.
+borgmatic_version: "2.1.4"
+
+# Full path to borgmatic binary — called directly by LaunchAgents to avoid
+# routing through mise, which triggers macOS TCC permission dialogs for
+# protected folders (e.g. ~/Documents) that hang headless LaunchAgent sessions.
+# Uses mise's "latest" symlink so version bumps don't break the LaunchAgent path.
+borgmatic_bin: /Users/erichblume/.local/share/mise/installs/pipx-borgmatic/latest/bin/borgmatic
+
 # Schedule: runs daily at 2:00 AM
 borgmatic_schedule_hour: 2
 borgmatic_schedule_minute: 0
@ -16,14 +26,47 @@ borgmatic_source_directories:
  - /opt/homebrew/var/forgejo
  - /Users/erichblume/.config/borgmatic
  - /Users/erichblume/Documents
-  - /Users/erichblume/Pictures
+  - /Users/erichblume/.local/share/borgmatic/k8s-dumps
+  # Shower app prize-photo uploads (sifaka SMB mount). Mounted manually
+  # on indri via Finder — see docs/how-to/operations/shower-app.md.
+  - /Volumes/shower

-# Backup repository
+# Backup repositories
 borgmatic_repositories:
  - path: /Volumes/backups/borg/
    label: sifaka-borg-backups
    encryption: repokey
    append_only: true
+  - path: ssh://u3ugi1x1@u3ugi1x1.repo.borgbase.com/./repo
+    label: borgbase-offsite
+    encryption: repokey
+    append_only: true
+
+# BorgBase SSH key (fetched from 1Password in playbook pre_tasks)
+borgmatic_borgbase_ssh_key_path: /Users/erichblume/.ssh/borgbase_ed25519
+
+# Directory for pre-backup database dumps from k8s pods
+borgmatic_k8s_dump_dir: /Users/erichblume/.local/share/borgmatic/k8s-dumps
+
+# K8s SQLite databases to dump before backup via kubectl exec
+# Each entry runs: kubectl exec <pod-selector> -- sqlite3 <path> ".backup /tmp/backup.db"
+# then copies the dump to borgmatic_k8s_dump_dir/<name>.db
+borgmatic_k8s_sqlite_dumps:
+  - name: mealie
+    namespace: mealie
+    label_selector: app=mealie
+    db_path: /app/data/mealie.db
+    # migrated to ringtail (wave-1); ssh to ringtail and run k3s kubectl
+    # there, same as shower below.
+    target: ssh:eblume@ringtail
+  - name: shower
+    namespace: shower
+    label_selector: app=shower
+    db_path: /app/data/db.sqlite3
+    # ssh to ringtail and run k3s kubectl there — avoids needing a
+    # ringtail kubeconfig on indri. k3s.yaml on ringtail is
+    # world-readable (mode 644), so no sudo required.
+    target: ssh:eblume@ringtail

 # Exclude patterns
 borgmatic_exclude_patterns: []
@ -39,14 +82,42 @@ borgmatic_keep_yearly: 1000
 # PostgreSQL databases to backup (streamed via pg_dump)
 # Password is read from ~/.pgpass (managed by this role)
 # pg_dump_command must be full path since LaunchAgent doesn't have homebrew in PATH
+# --- Immich photo library backup (BorgBase offsite only) ---
+borgmatic_photos_config: /Users/erichblume/.config/borgmatic/photos.yaml
+borgmatic_photos_source_directories:
+  - /Volumes/photos/library
+  - /Volumes/photos/upload
+borgmatic_photos_borgbase_repo: ssh://xcrtl5tg@xcrtl5tg.repo.borgbase.com/./repo
+# Schedule: runs daily at 4:00 AM (offset from main backup at 2:00 AM)
+borgmatic_photos_schedule_hour: 4
+borgmatic_photos_schedule_minute: 0
+# Retention: photos are precious, keep more history
+borgmatic_photos_keep_daily: 7
+borgmatic_photos_keep_monthly: 12
+borgmatic_photos_keep_yearly: 1000
+
 borgmatic_pg_dump_command: /opt/homebrew/opt/postgresql@18/bin/pg_dump
 borgmatic_postgresql_databases:
-  # k8s PostgreSQL (CloudNativePG)
+  # k8s PostgreSQL (CloudNativePG) via Caddy L4 proxy
  - name: miniflux
-    hostname: pg.tail8d86e.ts.net
+    hostname: pg.ops.eblu.me
    port: 5432
    username: borgmatic
+  - name: authentik
+    hostname: pg.ops.eblu.me
+    port: 5432
+    username: borgmatic
+  # migrated to ringtail blumeops-pg (wave-1); port 5434 = Caddy L4 route
  - name: teslamate
-    hostname: pg.tail8d86e.ts.net
-    port: 5432
+    hostname: pg.ops.eblu.me
+    port: 5434
+    username: borgmatic
+  - name: paperless
+    hostname: pg.ops.eblu.me
+    port: 5434
+    username: borgmatic
+  # immich-pg cluster (VectorChord) via Caddy L4 on port 5433
+  - name: immich
+    hostname: pg.ops.eblu.me
+    port: 5433
    username: borgmatic
--- a/ansible/roles/borgmatic/handlers/main.yml
+++ b/ansible/roles/borgmatic/handlers/main.yml
@ -4,3 +4,9 @@
    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.borgmatic.plist 2>/dev/null || true
    launchctl load ~/Library/LaunchAgents/mcquack.eblume.borgmatic.plist
  changed_when: true
+
+- name: Reload borgmatic-photos
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.borgmatic-photos.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.borgmatic-photos.plist
+  changed_when: true
--- a/ansible/roles/borgmatic/tasks/main.yml
+++ b/ansible/roles/borgmatic/tasks/main.yml
@ -1,6 +1,11 @@
 ---
-# Note: borgmatic is installed via mise (pipx), not managed here.
-# This role manages the config file and scheduled LaunchAgent.
+# Borgmatic is installed via mise (pipx) and called directly by LaunchAgents.
+# This role manages installation, config, and the scheduled LaunchAgents.
+
+- name: Install borgmatic via mise
+  ansible.builtin.command: mise install pipx:borgmatic@{{ borgmatic_version }}
+  register: borgmatic_install
+  changed_when: "'installed' in borgmatic_install.stderr"

 - name: Ensure borgmatic config directory exists
  ansible.builtin.file:
@ -14,11 +19,52 @@
  ansible.builtin.copy:
    content: |
      # Managed by ansible (borgmatic role) - k8s PostgreSQL backup credentials
-      pg.tail8d86e.ts.net:5432:*:borgmatic:{{ borgmatic_db_password }}
+      # 5432 = minikube blumeops-pg, 5433 = immich-pg, 5434 = ringtail blumeops-pg
+      pg.ops.eblu.me:5432:*:borgmatic:{{ borgmatic_db_password }}
+      pg.ops.eblu.me:5433:*:borgmatic:{{ borgmatic_db_password }}
+      pg.ops.eblu.me:5434:*:borgmatic:{{ borgmatic_db_password }}
    dest: ~/.pgpass
    mode: '0600'
  no_log: true

+# BorgBase offsite backup - SSH key and host verification
+- name: Deploy BorgBase SSH private key
+  ansible.builtin.copy:
+    content: "{{ borgbase_ssh_private_key }}\n"
+    dest: "{{ borgmatic_borgbase_ssh_key_path }}"
+    mode: '0600'
+  no_log: true
+
+- name: Add BorgBase host keys to known_hosts
+  ansible.builtin.known_hosts:
+    name: "{{ item }}"
+    key: "{{ item }} ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGU0mISTyHBw9tBs6SuhSq8tvNM8m9eifQxM+88TowPO"
+    state: present
+  loop:
+    - u3ugi1x1.repo.borgbase.com
+    - xcrtl5tg.repo.borgbase.com
+
+- name: Ensure k8s dump directory exists
+  ansible.builtin.file:
+    path: "{{ borgmatic_k8s_dump_dir }}"
+    state: directory
+    mode: '0700'
+  when: borgmatic_k8s_sqlite_dumps | length > 0
+
+- name: Ensure ~/bin exists
+  ansible.builtin.file:
+    path: "{{ ansible_env.HOME }}/bin"
+    state: directory
+    mode: '0755'
+  when: borgmatic_k8s_sqlite_dumps | length > 0
+
+- name: Deploy k8s SQLite dump helper script
+  ansible.builtin.template:
+    src: k8s-sqlite-dump.sh.j2
+    dest: "{{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump"
+    mode: '0755'
+  when: borgmatic_k8s_sqlite_dumps | length > 0
+
 - name: Deploy borgmatic configuration
  ansible.builtin.template:
    src: config.yaml.j2
@ -43,3 +89,30 @@
  when: borgmatic_launchctl_check.rc != 0
  changed_when: true
  failed_when: false
+
+# --- Immich photo library backup (BorgBase offsite only) ---
+
+- name: Deploy borgmatic photos configuration
+  ansible.builtin.template:
+    src: photos.yaml.j2
+    dest: "{{ borgmatic_photos_config }}"
+    mode: '0600'
+
+- name: Deploy borgmatic-photos LaunchAgent plist
+  ansible.builtin.template:
+    src: borgmatic-photos.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.borgmatic-photos.plist
+    mode: '0644'
+  notify: Reload borgmatic-photos
+
+- name: Check if borgmatic-photos LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.borgmatic-photos
+  register: borgmatic_photos_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load borgmatic-photos LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.borgmatic-photos.plist
+  when: borgmatic_photos_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
--- a/ansible/roles/borgmatic/templates/borgmatic-photos.plist.j2
+++ b/ansible/roles/borgmatic/templates/borgmatic-photos.plist.j2
@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>KeepAlive</key>
+	<false/>
+	<key>Label</key>
+	<string>mcquack.eblume.borgmatic-photos</string>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<key>PATH</key>
+		<string>/opt/homebrew/bin:/usr/bin:/bin</string>
+	</dict>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ borgmatic_bin }}</string>
+		<string>--config</string>
+		<string>{{ borgmatic_photos_config }}</string>
+		<string>create</string>
+	</array>
+	<key>RunAtLoad</key>
+	<false/>
+	<key>StandardErrorPath</key>
+	<string>{{ borgmatic_log_dir }}/mcquack.borgmatic-photos.err.log</string>
+	<key>StandardOutPath</key>
+	<string>{{ borgmatic_log_dir }}/mcquack.borgmatic-photos.out.log</string>
+	<key>StartCalendarInterval</key>
+	<dict>
+		<key>Hour</key>
+		<integer>{{ borgmatic_photos_schedule_hour }}</integer>
+		<key>Minute</key>
+		<integer>{{ borgmatic_photos_schedule_minute }}</integer>
+	</dict>
+</dict>
+</plist>
--- a/ansible/roles/borgmatic/templates/borgmatic.plist.j2
+++ b/ansible/roles/borgmatic/templates/borgmatic.plist.j2
@ -14,16 +14,13 @@
 	</dict>
 	<key>ProgramArguments</key>
 	<array>
-		<string>/opt/homebrew/opt/mise/bin/mise</string>
-		<string>x</string>
-		<string>--</string>
-		<string>borgmatic</string>
+		<string>{{ borgmatic_bin }}</string>
 		<string>--config</string>
 		<string>{{ borgmatic_config }}</string>
 		<string>create</string>
 	</array>
 	<key>RunAtLoad</key>
-	<true/>
+	<false/>
 	<key>StandardErrorPath</key>
 	<string>{{ borgmatic_log_dir }}/mcquack.borgmatic.err.log</string>
 	<key>StandardOutPath</key>
--- a/ansible/roles/borgmatic/templates/config.yaml.j2
+++ b/ansible/roles/borgmatic/templates/config.yaml.j2
@ -31,6 +31,26 @@ exclude_patterns:

 encryption_passcommand: {{ borgmatic_encryption_passcommand }}

+{% if borgmatic_k8s_sqlite_dumps %}
+# Pre-backup: dump SQLite databases from k8s pods.
+# Uses sqlite3.backup() for a safe, consistent copy.
+#
+# Quoting/escaping is delegated to ~/bin/borgmatic-k8s-sqlite-dump
+# (deployed by the borgmatic ansible role). Each entry's `target`
+# is either:
+#   - local:<context>  -> local kubectl with --context (mealie etc.)
+#   - ssh:<user@host>  -> ssh + k3s kubectl on the cluster host,
+#                         used for ringtail since indri's kubeconfig
+#                         deliberately doesn't carry that context.
+before_backup:
+    - mkdir -p {{ borgmatic_k8s_dump_dir }}
+{% for db in borgmatic_k8s_sqlite_dumps %}
+    - {{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump {{ db.target }} {{ db.namespace }} {{ db.label_selector }} {{ db.db_path }} {{ db.name }} {{ borgmatic_k8s_dump_dir }}/{{ db.name }}.db
+{% endfor %}
+{% endif %}
+
+ssh_command: ssh -o IdentitiesOnly=yes -i {{ borgmatic_borgbase_ssh_key_path }}
+
 # Retention policy
 keep_daily: {{ borgmatic_keep_daily }}
 keep_monthly: {{ borgmatic_keep_monthly }}
--- a/ansible/roles/borgmatic/templates/k8s-sqlite-dump.sh.j2
+++ b/ansible/roles/borgmatic/templates/k8s-sqlite-dump.sh.j2
@ -0,0 +1,73 @@
+#!/usr/bin/env bash
+# {{ ansible_managed }}
+#
+# Helper script invoked by borgmatic's before_backup hook to capture a
+# k8s pod's SQLite database. Keeps the borgmatic config readable by
+# pulling all the quoting out of YAML.
+#
+# Usage:
+#   borgmatic-k8s-sqlite-dump <target> <namespace> <selector> \
+#                             <db_path> <name> <dump_target>
+#
+# <target> is one of:
+#   local:<context>   - run local kubectl with --context=<context>
+#   ssh:<user@host>   - ssh to host and run k3s kubectl there
+#                       (no indri-side kubeconfig needed)
+#
+# <namespace>      - k8s namespace of the pod
+# <selector>       - label selector to find the pod (e.g. app=shower)
+# <db_path>        - absolute path inside the pod to the SQLite DB
+# <name>           - short name used for temp filenames
+# <dump_target>    - file on this host to receive the dump
+set -euo pipefail
+
+target=${1:?missing target}
+namespace=${2:?missing namespace}
+selector=${3:?missing selector}
+db_path=${4:?missing db path}
+name=${5:?missing name}
+dump_target=${6:?missing dump target}
+
+# Stage the backup next to the source DB (a guaranteed-writable volume);
+# minimal nix images (e.g. mealie) have no /tmp.
+pod_tmp="$(dirname "$db_path")/.borgmatic-backup-${name}.db"
+
+python_backup='import sqlite3; sqlite3.connect("'"$db_path"'").backup(sqlite3.connect("'"$pod_tmp"'"))'
+
+mode=${target%%:*}
+ref=${target#*:}
+
+case "$mode" in
+    local)
+        # Pulls dump bytes out via "kubectl exec -- cat" rather than
+        # "kubectl cp", which would otherwise need tar inside the pod
+        # (nix-built images like shower don't bundle tar).
+        context=$ref
+        kubectl="/opt/homebrew/bin/kubectl --context=$context -n $namespace"
+        pod=$($kubectl get pod -l "$selector" \
+            -o jsonpath='{.items[0].metadata.name}')
+        $kubectl exec "$pod" -- python3 -c "$python_backup"
+        $kubectl exec "$pod" -- cat "$pod_tmp" > "$dump_target"
+        $kubectl exec "$pod" -- rm -f "$pod_tmp"
+        ;;
+    ssh)
+        host=$ref
+        # Force bash on the remote (user's login shell on ringtail is
+        # fish). Pipe the script via stdin to dodge nested quoting.
+        # The dump bytes come back over the ssh stdout stream — no
+        # intermediate scp, no tar requirement in the pod.
+        ssh "$host" bash <<EOF > "$dump_target"
+set -euo pipefail
+export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
+pod=\$(k3s kubectl -n "$namespace" get pod -l "$selector" -o jsonpath='{.items[0].metadata.name}')
+k3s kubectl -n "$namespace" exec "\$pod" -- python3 -c '$python_backup' 1>&2
+k3s kubectl -n "$namespace" exec "\$pod" -- cat "$pod_tmp"
+k3s kubectl -n "$namespace" exec "\$pod" -- rm -f "$pod_tmp" 1>&2
+EOF
+        ;;
+    *)
+        echo "borgmatic-k8s-sqlite-dump: unknown target mode: $mode" >&2
+        echo "  expected local:<context> or ssh:<user@host>" >&2
+        exit 1
+        ;;
+esac
--- a/ansible/roles/borgmatic/templates/photos.yaml.j2
+++ b/ansible/roles/borgmatic/templates/photos.yaml.j2
@ -0,0 +1,37 @@
+# {{ ansible_managed }}
+#
+# Borgmatic config for immich photo library backup.
+# Backs up library/ and upload/ from /Volumes/photos (sifaka SMB mount)
+# to BorgBase offsite ONLY. Excludes encoded-video/, thumbs/, backups/
+# since those are regenerable from originals.
+#
+# Separate from the main borgmatic config to keep concerns isolated:
+# - main config: indri data → sifaka + borgbase
+# - this config: sifaka photos → borgbase (different repo)
+
+local_path: {{ borgmatic_local_path }}
+
+source_directories:
+{% for dir in borgmatic_photos_source_directories %}
+    - {{ dir }}
+{% endfor %}
+
+source_directories_must_exist: true
+
+repositories:
+    - path: {{ borgmatic_photos_borgbase_repo }}
+      label: borgbase-immich-photos
+      encryption: repokey
+      append_only: true
+
+encryption_passcommand: {{ borgmatic_encryption_passcommand }}
+
+ssh_command: ssh -o IdentitiesOnly=yes -o ServerAliveInterval=30 -o ServerAliveCountMax=5 -i {{ borgmatic_borgbase_ssh_key_path }}
+
+# Save checkpoints every 10 minutes so interrupted backups don't lose all progress
+checkpoint_interval: 600
+
+# Retention policy — photos are precious, keep more history
+keep_daily: {{ borgmatic_photos_keep_daily }}
+keep_monthly: {{ borgmatic_photos_keep_monthly }}
+keep_yearly: {{ borgmatic_photos_keep_yearly }}
--- a/ansible/roles/borgmatic_metrics/defaults/main.yml
+++ b/ansible/roles/borgmatic_metrics/defaults/main.yml
@ -1,7 +1,15 @@
 ---
-borgmatic_metrics_repo: /Volumes/backups/borg/
+# Borg repositories to collect metrics from
+# Each entry needs a path (local or ssh://) and a label for Prometheus metrics
+borgmatic_metrics_repos:
+  - path: /Volumes/backups/borg/
+    label: sifaka-local
+  - path: ssh://xcrtl5tg@xcrtl5tg.repo.borgbase.com/./repo
+    label: borgbase-immich-photos
+
 borgmatic_metrics_passcommand: cat /Users/erichblume/.borg/config.yaml
+borgmatic_metrics_ssh_key: /Users/erichblume/.ssh/borgbase_ed25519
 borgmatic_metrics_dir: /opt/homebrew/var/node_exporter/textfile
-borgmatic_metrics_script: /Users/erichblume/bin/borgmatic-metrics
+borgmatic_metrics_script: /Users/erichblume/.local/bin/borgmatic-metrics
 borgmatic_metrics_interval: 3600  # seconds between metric collection (hourly)
 borgmatic_metrics_log_dir: /opt/homebrew/var/log
--- a/ansible/roles/borgmatic_metrics/templates/borgmatic-metrics.sh.j2
+++ b/ansible/roles/borgmatic_metrics/templates/borgmatic-metrics.sh.j2
@ -1,11 +1,12 @@
 #!/bin/bash
 # {{ ansible_managed }}
 # Collects borg backup metrics for node_exporter textfile collector
+# Supports multiple repositories with a repo label for Prometheus

 set -euo pipefail

 export BORG_PASSCOMMAND="{{ borgmatic_metrics_passcommand }}"
-BORG_REPO="{{ borgmatic_metrics_repo }}"
+export BORG_RSH="ssh -o IdentitiesOnly=yes -i {{ borgmatic_metrics_ssh_key }}"
 OUTPUT_FILE="{{ borgmatic_metrics_dir }}/borgmatic.prom"
 TEMP_FILE="${OUTPUT_FILE}.tmp"

@ -13,129 +14,109 @@ TEMP_FILE="${OUTPUT_FILE}.tmp"
 BORG_CMD="/opt/homebrew/bin/borg"
 JQ_CMD="/opt/homebrew/bin/jq"

-# Get repository info
-repo_json=$($BORG_CMD info --json "$BORG_REPO" 2>/dev/null) || {
-    echo "Failed to get borg repo info" >&2
-    # Write down metric
-    cat > "$TEMP_FILE" << 'EOF'
+# Start fresh
+cat > "$TEMP_FILE" << 'EOF'
 # HELP borgmatic_up Borg backup repository is accessible
 # TYPE borgmatic_up gauge
-borgmatic_up 0
-EOF
-    mv "$TEMP_FILE" "$OUTPUT_FILE"
-    exit 0
-}
-
-# Get archive list
-archives_json=$($BORG_CMD list --json "$BORG_REPO" 2>/dev/null) || {
-    echo "Failed to list borg archives" >&2
-    exit 1
-}
-
-# Extract repository stats
-total_size=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_size')
-total_csize=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_csize')
-unique_size=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.unique_size')
-unique_csize=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.unique_csize')
-total_chunks=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_chunks')
-unique_chunks=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_unique_chunks')
-
-# Count archives
-archive_count=$(echo "$archives_json" | $JQ_CMD -r '.archives | length')
-
-# Get last archive info
-last_archive_name=$(echo "$archives_json" | $JQ_CMD -r '.archives[-1].name // empty')
-
-if [ -n "$last_archive_name" ]; then
-    # Get detailed info for the last archive
-    last_archive_json=$($BORG_CMD info --json "${BORG_REPO}::${last_archive_name}" 2>/dev/null) || {
-        echo "Failed to get last archive info" >&2
-        last_archive_json=""
-    }
-
-    if [ -n "$last_archive_json" ]; then
-        last_original_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.original_size')
-        last_compressed_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.compressed_size')
-        last_deduplicated_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.deduplicated_size')
-        last_nfiles=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.nfiles')
-        last_start=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].start')
-        last_end=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].end')
-        last_duration=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].duration')
-
-        # Convert timestamp to unix epoch
-        last_timestamp=$(date -j -f "%Y-%m-%dT%H:%M:%S" "${last_start%.*}" "+%s" 2>/dev/null || echo "0")
-    fi
-fi
-
-# Write metrics
-cat > "$TEMP_FILE" << EOF
-# HELP borgmatic_up Borg backup repository is accessible
-# TYPE borgmatic_up gauge
-borgmatic_up 1
-
 # HELP borgmatic_repo_original_size_bytes Total original size of all archives (sum of what each backup contains)
 # TYPE borgmatic_repo_original_size_bytes gauge
-borgmatic_repo_original_size_bytes $total_size
-
 # HELP borgmatic_repo_compressed_size_bytes Total compressed size of all archives
 # TYPE borgmatic_repo_compressed_size_bytes gauge
-borgmatic_repo_compressed_size_bytes $total_csize
-
 # HELP borgmatic_repo_deduplicated_size_bytes Actual disk usage after deduplication (unique data)
 # TYPE borgmatic_repo_deduplicated_size_bytes gauge
-borgmatic_repo_deduplicated_size_bytes $unique_csize
-
 # HELP borgmatic_repo_total_chunks Total number of chunks across all archives
 # TYPE borgmatic_repo_total_chunks gauge
-borgmatic_repo_total_chunks $total_chunks
-
 # HELP borgmatic_repo_unique_chunks Number of unique chunks (after deduplication)
 # TYPE borgmatic_repo_unique_chunks gauge
-borgmatic_repo_unique_chunks $unique_chunks
-
 # HELP borgmatic_archive_count Number of archives in the repository
 # TYPE borgmatic_archive_count gauge
-borgmatic_archive_count $archive_count
-EOF
-
-# Add last archive metrics if available
-if [ -n "${last_original_size:-}" ]; then
-    cat >> "$TEMP_FILE" << EOF
-
 # HELP borgmatic_last_archive_original_size_bytes Original size of the last archive (data being backed up)
 # TYPE borgmatic_last_archive_original_size_bytes gauge
-borgmatic_last_archive_original_size_bytes $last_original_size
-
 # HELP borgmatic_last_archive_compressed_size_bytes Compressed size of the last archive
 # TYPE borgmatic_last_archive_compressed_size_bytes gauge
-borgmatic_last_archive_compressed_size_bytes $last_compressed_size
-
 # HELP borgmatic_last_archive_deduplicated_size_bytes Deduplicated size of last archive (new data added)
 # TYPE borgmatic_last_archive_deduplicated_size_bytes gauge
-borgmatic_last_archive_deduplicated_size_bytes $last_deduplicated_size
-
 # HELP borgmatic_last_archive_files Number of files in the last archive
 # TYPE borgmatic_last_archive_files gauge
-borgmatic_last_archive_files $last_nfiles
-
 # HELP borgmatic_last_archive_timestamp Unix timestamp of the last backup
 # TYPE borgmatic_last_archive_timestamp gauge
-borgmatic_last_archive_timestamp $last_timestamp
-
 # HELP borgmatic_last_archive_duration_seconds Duration of the last backup in seconds
 # TYPE borgmatic_last_archive_duration_seconds gauge
-borgmatic_last_archive_duration_seconds ${last_duration:-0}
-EOF
-
-    # Collect per-source-directory sizes
-    cat >> "$TEMP_FILE" << 'EOF'
-
 # HELP borgmatic_source_size_bytes Size of each backup source directory in bytes
 # TYPE borgmatic_source_size_bytes gauge
 EOF

-    # List archive contents and group by source directory
-    $BORG_CMD list "${BORG_REPO}::${last_archive_name}" --format "{size} {path}{NL}" 2>/dev/null | awk '
+collect_repo_metrics() {
+    local repo_path="$1"
+    local repo_label="$2"
+
+    # Get repository info
+    repo_json=$($BORG_CMD info --json "$repo_path" 2>/dev/null) || {
+        echo "Failed to get borg repo info for $repo_label" >&2
+        echo "borgmatic_up{repo=\"$repo_label\"} 0" >> "$TEMP_FILE"
+        return
+    }
+
+    # Get archive list
+    archives_json=$($BORG_CMD list --json "$repo_path" 2>/dev/null) || {
+        echo "Failed to list borg archives for $repo_label" >&2
+        echo "borgmatic_up{repo=\"$repo_label\"} 0" >> "$TEMP_FILE"
+        return
+    }
+
+    # Extract repository stats
+    total_size=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_size')
+    total_csize=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_csize')
+    unique_size=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.unique_size')
+    unique_csize=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.unique_csize')
+    total_chunks=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_chunks')
+    unique_chunks=$(echo "$repo_json" | $JQ_CMD -r '.cache.stats.total_unique_chunks')
+    archive_count=$(echo "$archives_json" | $JQ_CMD -r '.archives | length')
+
+    cat >> "$TEMP_FILE" << EOF
+borgmatic_up{repo="$repo_label"} 1
+borgmatic_repo_original_size_bytes{repo="$repo_label"} $total_size
+borgmatic_repo_compressed_size_bytes{repo="$repo_label"} $total_csize
+borgmatic_repo_deduplicated_size_bytes{repo="$repo_label"} $unique_csize
+borgmatic_repo_total_chunks{repo="$repo_label"} $total_chunks
+borgmatic_repo_unique_chunks{repo="$repo_label"} $unique_chunks
+borgmatic_archive_count{repo="$repo_label"} $archive_count
+EOF
+
+    # Get last archive info
+    last_archive_name=$(echo "$archives_json" | $JQ_CMD -r '.archives[-1].name // empty')
+
+    if [ -z "$last_archive_name" ]; then
+        return
+    fi
+
+    # Get detailed info for the last archive
+    last_archive_json=$($BORG_CMD info --json "${repo_path}::${last_archive_name}" 2>/dev/null) || {
+        echo "Failed to get last archive info for $repo_label" >&2
+        return
+    }
+
+    last_original_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.original_size')
+    last_compressed_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.compressed_size')
+    last_deduplicated_size=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.deduplicated_size')
+    last_nfiles=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].stats.nfiles')
+    last_start=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].start')
+    last_duration=$(echo "$last_archive_json" | $JQ_CMD -r '.archives[0].duration')
+
+    # Convert timestamp to unix epoch
+    last_timestamp=$(date -j -f "%Y-%m-%dT%H:%M:%S" "${last_start%.*}" "+%s" 2>/dev/null || echo "0")
+
+    cat >> "$TEMP_FILE" << EOF
+borgmatic_last_archive_original_size_bytes{repo="$repo_label"} $last_original_size
+borgmatic_last_archive_compressed_size_bytes{repo="$repo_label"} $last_compressed_size
+borgmatic_last_archive_deduplicated_size_bytes{repo="$repo_label"} $last_deduplicated_size
+borgmatic_last_archive_files{repo="$repo_label"} $last_nfiles
+borgmatic_last_archive_timestamp{repo="$repo_label"} $last_timestamp
+borgmatic_last_archive_duration_seconds{repo="$repo_label"} ${last_duration:-0}
+EOF
+
+    # Collect per-source-directory sizes
+    $BORG_CMD list "${repo_path}::${last_archive_name}" --format "{size} {path}{NL}" 2>/dev/null | awk -v repo="$repo_label" '
    {
        size = $1
        path = $2
@ -145,8 +126,10 @@ EOF
        else if (path ~ /^Users\/[^\/]+\/devpi/) { source = "devpi" }
        else if (path ~ /^Users\/[^\/]+\/code\/personal\/zk/) { source = "Zettelkasten" }
        else if (path ~ /^Users\/[^\/]+\/.config\/borgmatic/) { source = "borgmatic_config" }
+        else if (path ~ /^Users\/[^\/]+\/.local\/share\/borgmatic/) { source = "k8s_dumps" }
        else if (path ~ /^opt\/homebrew\/var\/forgejo/) { source = "Forgejo" }
        else if (path ~ /^opt\/homebrew\/var\/loki/) { source = "Loki" }
+        else if (path ~ /^Volumes\/photos/) { source = "immich_photos" }
        else if (path ~ /^borgmatic\/postgresql_databases/) { source = "PostgreSQL" }
        else if (path ~ /^borgmatic\//) { source = "borgmatic_metadata" }
        else { source = "other" }
@ -155,10 +138,15 @@ EOF
    }
    END {
        for (src in totals) {
-            printf "borgmatic_source_size_bytes{source=\"%s\"} %.0f\n", src, totals[src]
+            printf "borgmatic_source_size_bytes{repo=\"%s\",source=\"%s\"} %.0f\n", repo, src, totals[src]
        }
    }' >> "$TEMP_FILE"
-fi
+}
+
+# Collect metrics for each configured repository
+{% for repo in borgmatic_metrics_repos %}
+collect_repo_metrics "{{ repo.path }}" "{{ repo.label }}"
+{% endfor %}

 # Atomic move
 mv "$TEMP_FILE" "$OUTPUT_FILE"
--- a/ansible/roles/caddy/defaults/main.yml
+++ b/ansible/roles/caddy/defaults/main.yml
@ -0,0 +1,128 @@
+---
+# Caddy reverse proxy configuration
+# Caddy is built from ~/code/3rd/caddy with Gandi DNS and Layer 4 plugins
+
+caddy_repo_dir: /Users/erichblume/code/3rd/caddy
+caddy_binary: "{{ caddy_repo_dir }}/bin/caddy"
+caddy_config_dir: /Users/erichblume/.config/caddy
+caddy_data_dir: /Users/erichblume/.local/share/caddy
+caddy_log_dir: /Users/erichblume/Library/Logs
+
+# Gandi API token file (written by ansible, chmod 0600)
+# Caddy reads this file for ACME DNS-01 challenges
+caddy_gandi_token_file: /Users/erichblume/.config/caddy/gandi-token
+
+# Domain configuration
+caddy_domain: ops.eblu.me
+
+# HTTPS port (443 is standard)
+caddy_https_port: 443
+
+# Services to proxy
+# Format: { name: "service", host: "hostname", backend: "url" }
+caddy_services:
+  # Indri-local services
+  - name: forge
+    host: "forge.{{ caddy_domain }}"
+    backend: "http://localhost:3001"
+  - name: registry
+    host: "registry.{{ caddy_domain }}"
+    backend: "http://localhost:5050"
+  - name: jellyfin
+    host: "jellyfin.{{ caddy_domain }}"
+    backend: "http://localhost:8096"
+
+  # K8s services (via Tailscale Ingress)
+  # Caddy proxies to existing Tailscale endpoints - traffic stays local
+  - name: grafana
+    host: "grafana.{{ caddy_domain }}"
+    backend: "https://grafana.tail8d86e.ts.net"
+  - name: argocd
+    host: "argocd.{{ caddy_domain }}"
+    backend: "https://argocd.tail8d86e.ts.net"
+  - name: prometheus
+    host: "prometheus.{{ caddy_domain }}"
+    backend: "https://prometheus.tail8d86e.ts.net"
+  - name: loki
+    host: "loki.{{ caddy_domain }}"
+    backend: "https://loki.tail8d86e.ts.net"
+  - name: miniflux
+    host: "feed.{{ caddy_domain }}"
+    backend: "https://feed.tail8d86e.ts.net"
+  - name: devpi
+    host: "pypi.{{ caddy_domain }}"
+    backend: "http://localhost:3141"
+  - name: heph
+    host: "heph.{{ caddy_domain }}"
+    backend: "http://localhost:8787"  # hephaestus hub (server mode) + PWA shell
+  - name: kiwix
+    host: "kiwix.{{ caddy_domain }}"
+    backend: "https://kiwix.tail8d86e.ts.net"
+  - name: torrent
+    host: "torrent.{{ caddy_domain }}"
+    backend: "https://torrent.tail8d86e.ts.net"
+  - name: teslamate
+    host: "tesla.{{ caddy_domain }}"
+    backend: "https://tesla.tail8d86e.ts.net"
+  - name: immich
+    host: "photos.{{ caddy_domain }}"
+    backend: "https://photos.tail8d86e.ts.net"
+  - name: navidrome
+    host: "dj.{{ caddy_domain }}"
+    backend: "https://dj.tail8d86e.ts.net"
+  - name: homepage
+    host: "go.{{ caddy_domain }}"
+    backend: "https://go.tail8d86e.ts.net"
+  - name: docs
+    host: "docs.{{ caddy_domain }}"
+    kind: static
+    root: "{{ docs_content_dir }}"
+    try_html: true  # Quartz: path → path/ → path.html → 404.html
+  - name: cv
+    host: "cv.{{ caddy_domain }}"
+    kind: static
+    root: "{{ cv_content_dir }}"
+    download_paths:
+      - path: /resume.pdf
+        filename: erich-blume-resume.pdf
+  - name: nvr
+    host: "nvr.{{ caddy_domain }}"
+    backend: "https://nvr.tail8d86e.ts.net"
+  - name: authentik
+    host: "authentik.{{ caddy_domain }}"
+    backend: "https://authentik.tail8d86e.ts.net"
+    cache_policy: spa
+  - name: ntfy
+    host: "ntfy.{{ caddy_domain }}"
+    backend: "https://ntfy.tail8d86e.ts.net"
+  - name: ollama
+    host: "ollama.{{ caddy_domain }}"
+    backend: "https://ollama.tail8d86e.ts.net"
+  - name: mealie
+    host: "meals.{{ caddy_domain }}"
+    backend: "https://meals.tail8d86e.ts.net"
+  - name: paperless
+    host: "paperless.{{ caddy_domain }}"
+    backend: "https://paperless.tail8d86e.ts.net"
+  - name: shower
+    host: "shower.{{ caddy_domain }}"
+    backend: "https://shower.tail8d86e.ts.net"
+  - name: sifaka
+    host: "nas.{{ caddy_domain }}"
+    backend: "http://sifaka:5000"
+
+# Layer 4 (TCP) services
+# Format: { port: external_port, backend: "host:port" }
+caddy_tcp_services:
+  - port: 2222
+    backend: "localhost:2200"  # Forgejo SSH
+  - port: 5432
+    backend: "pg.tail8d86e.ts.net:5432"  # PostgreSQL (blumeops-pg)
+  - port: 5433
+    backend: "immich-pg.tail8d86e.ts.net:5432"  # PostgreSQL (immich-pg)
+  - port: 5434
+    backend: "blumeops-pg-ringtail.tail8d86e.ts.net:5432"  # PostgreSQL (blumeops-pg on ringtail)
+  - port: "{{ sifaka_node_exporter_port }}"
+    backend: "sifaka:{{ sifaka_node_exporter_port }}"  # Sifaka node_exporter
+  - port: "{{ sifaka_smartctl_exporter_port }}"
+    backend: "sifaka:{{ sifaka_smartctl_exporter_port }}"  # Sifaka smartctl_exporter
--- a/ansible/roles/caddy/handlers/main.yml
+++ b/ansible/roles/caddy/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Restart caddy
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.caddy.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.caddy.plist
+  changed_when: true
--- a/ansible/roles/caddy/tasks/main.yml
+++ b/ansible/roles/caddy/tasks/main.yml
@ -0,0 +1,80 @@
+---
+# Caddy reverse proxy deployment
+# Binary is built manually - see ~/code/3rd/caddy/mise.toml
+
+- name: Verify caddy binary exists
+  ansible.builtin.stat:
+    path: "{{ caddy_binary }}"
+  register: caddy_bin
+  failed_when: not caddy_bin.stat.exists
+  changed_when: false
+
+- name: Create caddy config directory
+  ansible.builtin.file:
+    path: "{{ caddy_config_dir }}"
+    state: directory
+    mode: "0755"
+
+- name: Create caddy data directory
+  ansible.builtin.file:
+    path: "{{ caddy_data_dir }}"
+    state: directory
+    mode: "0755"
+
+- name: Fetch Gandi PAT (when running with --tags caddy)
+  ansible.builtin.command:
+    cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/mco6ka3dc3rmw7zkg2dhia5d2m/pat"
+  delegate_to: localhost
+  register: _caddy_gandi_token_fallback
+  changed_when: false
+  no_log: true
+  check_mode: false
+  when: caddy_gandi_token is not defined
+
+- name: Set Gandi token fact (fallback)
+  ansible.builtin.set_fact:
+    caddy_gandi_token: "{{ _caddy_gandi_token_fallback.stdout }}"
+  no_log: true
+  when: caddy_gandi_token is not defined
+
+- name: Write Gandi token file
+  ansible.builtin.copy:
+    content: "{{ caddy_gandi_token }}"
+    dest: "{{ caddy_gandi_token_file }}"
+    mode: "0600"
+  no_log: true
+  notify: Restart caddy
+
+- name: Deploy Caddyfile
+  ansible.builtin.template:
+    src: Caddyfile.j2
+    dest: "{{ caddy_config_dir }}/Caddyfile"
+    mode: "0644"
+  notify: Restart caddy
+
+- name: Deploy caddy wrapper script
+  ansible.builtin.template:
+    src: caddy-wrapper.sh.j2
+    dest: "{{ caddy_config_dir }}/caddy-wrapper.sh"
+    mode: "0755"
+  notify: Restart caddy
+
+- name: Deploy caddy LaunchAgent plist
+  ansible.builtin.template:
+    src: caddy.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.caddy.plist
+    mode: "0644"
+  notify: Restart caddy
+
+- name: Check if caddy LaunchAgent is loaded
+  ansible.builtin.command:
+    cmd: launchctl list mcquack.eblume.caddy
+  register: caddy_launchctl
+  changed_when: false
+  failed_when: false
+
+- name: Load caddy LaunchAgent
+  ansible.builtin.command:
+    cmd: launchctl load ~/Library/LaunchAgents/mcquack.eblume.caddy.plist
+  when: caddy_launchctl.rc != 0
+  changed_when: true
--- a/ansible/roles/caddy/templates/Caddyfile.j2
+++ b/ansible/roles/caddy/templates/Caddyfile.j2
@ -0,0 +1,87 @@
+# Caddy reverse proxy for blumeops services
+# Managed by ansible - do not edit manually
+#
+# All *.{{ caddy_domain }} requests are proxied to backend services.
+# TLS certificates are obtained via ACME DNS-01 challenge using Gandi.
+
+{
+	# Global options
+	admin off
+
+{% if caddy_tcp_services %}
+	# Layer 4 (TCP) routing
+	layer4 {
+{% for tcp_svc in caddy_tcp_services %}
+		:{{ tcp_svc.port }} {
+			route {
+				proxy {{ tcp_svc.backend }}
+			}
+		}
+{% endfor %}
+	}
+{% endif %}
+}
+
+# Wildcard certificate for all services
+*.{{ caddy_domain }}:{{ caddy_https_port }} {
+	tls {
+		dns gandi {env.GANDI_BEARER_TOKEN}
+	}
+
+{% for service in caddy_services %}
+	@{{ service.name }} host {{ service.host }}
+	handle @{{ service.name }} {
+{% if service.kind | default('proxy') == 'static' %}
+		root * {{ service.root }}
+		encode gzip
+		# Long-cache fingerprinted assets; everything else stays default.
+		@{{ service.name }}_assets path_regexp \.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2)$
+		header @{{ service.name }}_assets Cache-Control "public, max-age=31536000, immutable"
+{% for dl in service.download_paths | default([]) %}
+		@{{ service.name }}_dl{{ loop.index }} path {{ dl.path }}
+		header @{{ service.name }}_dl{{ loop.index }} Content-Disposition `attachment; filename="{{ dl.filename }}"`
+{% endfor %}
+{% if service.try_html | default(false) %}
+		# Quartz clean URLs: path → path/ → path.html → /404.html (200).
+		# Caddy's handle_errors is a top-level directive and can't live in
+		# this nested handle, so the 404 page rides as the final try_files
+		# candidate (served with 200 — acceptable for a human-facing 404).
+		try_files {path} {path}/ {path}.html /404.html
+{% endif %}
+		file_server
+{% else %}
+{% if service.cache_policy | default('') == 'spa' %}
+		# SPA cache policy: hashed static assets are immutable, HTML must revalidate.
+		# Prevents stale HTML from referencing chunk hashes that no longer exist.
+		@{{ service.name }}_static path /static/dist/*
+		header @{{ service.name }}_static Cache-Control "public, max-age=31536000, immutable"
+		@{{ service.name }}_html path /if/*
+		header @{{ service.name }}_html Cache-Control "no-cache"
+{% endif %}
+{% if service.backend.startswith('https://') %}
+		reverse_proxy {{ service.backend }} {
+			# Caddy v2.11+ rewrites Host to upstream for HTTPS backends.
+			# Preserve the original Host so services see *.ops.eblu.me.
+			header_up Host {http.request.host}
+		}
+{% else %}
+		reverse_proxy {{ service.backend }}
+{% endif %}
+{% endif %}
+	}
+
+{% endfor %}
+	# Fallback for unknown hosts
+	handle {
+		respond "Unknown service" 404
+	}
+}
+
+# Base domain (ops.eblu.me)
+{{ caddy_domain }}:{{ caddy_https_port }} {
+	tls {
+		dns gandi {env.GANDI_BEARER_TOKEN}
+	}
+
+	respond "blumeops services - use a subdomain (e.g., forge.{{ caddy_domain }})"
+}
--- a/ansible/roles/caddy/templates/caddy-wrapper.sh.j2
+++ b/ansible/roles/caddy/templates/caddy-wrapper.sh.j2
@ -0,0 +1,6 @@
+#!/bin/bash
+# Wrapper script for Caddy that loads the Gandi token from file
+# Managed by ansible - do not edit manually
+
+export GANDI_BEARER_TOKEN=$(cat {{ caddy_gandi_token_file }})
+exec {{ caddy_binary }} run --config {{ caddy_config_dir }}/Caddyfile
--- a/ansible/roles/caddy/templates/caddy.plist.j2
+++ b/ansible/roles/caddy/templates/caddy.plist.j2
@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>mcquack.eblume.caddy</string>
+
+    <key>ProgramArguments</key>
+    <array>
+        <string>{{ caddy_config_dir }}/caddy-wrapper.sh</string>
+    </array>
+
+    <key>WorkingDirectory</key>
+    <string>{{ caddy_data_dir }}</string>
+
+    <key>EnvironmentVariables</key>
+    <dict>
+        <key>XDG_DATA_HOME</key>
+        <string>/Users/erichblume/.local/share</string>
+        <key>XDG_CONFIG_HOME</key>
+        <string>/Users/erichblume/.config</string>
+    </dict>
+
+    <key>RunAtLoad</key>
+    <true/>
+
+    <key>KeepAlive</key>
+    <true/>
+
+    <key>StandardOutPath</key>
+    <string>{{ caddy_log_dir }}/mcquack.caddy.out.log</string>
+
+    <key>StandardErrorPath</key>
+    <string>{{ caddy_log_dir }}/mcquack.caddy.err.log</string>
+</dict>
+</plist>
--- a/ansible/roles/cv/defaults/main.yml
+++ b/ansible/roles/cv/defaults/main.yml
@ -0,0 +1,10 @@
+---
+# CV / resume static site (native, replaces minikube Deployment)
+# Caddy serves cv_content_dir directly via the static-kind service block.
+
+cv_version: "v1.0.3"
+cv_release_url: "https://forge.ops.eblu.me/api/packages/eblume/generic/cv/{{ cv_version }}/cv-{{ cv_version }}.tar.gz"
+
+cv_home: /Users/erichblume/blumeops/cv
+cv_content_dir: "{{ cv_home }}/content"
+cv_version_sentinel: "{{ cv_home }}/.installed-version"
--- a/ansible/roles/cv/tasks/main.yml
+++ b/ansible/roles/cv/tasks/main.yml
@ -0,0 +1,57 @@
+---
+# cv role — download and extract the CV release tarball into cv_content_dir.
+# Caddy serves the directory directly; there is no daemon to manage.
+#
+# Idempotency: a sentinel file records the installed cv_version. The
+# download/extract steps only run when the sentinel doesn't match cv_version.
+#
+# We use curl rather than ansible.builtin.get_url because the forge generic-
+# packages endpoint returns 405 on HEAD requests, which get_url issues before
+# downloading.
+
+- name: Ensure cv home exists
+  ansible.builtin.file:
+    path: "{{ cv_home }}"
+    state: directory
+    mode: '0755'
+
+- name: Read installed cv version sentinel
+  ansible.builtin.slurp:
+    src: "{{ cv_version_sentinel }}"
+  register: cv_installed_raw
+  failed_when: false
+  changed_when: false
+
+- name: Set installed cv version fact
+  ansible.builtin.set_fact:
+    cv_installed_version: >-
+      {{ (cv_installed_raw.content | b64decode).strip()
+         if (cv_installed_raw.content is defined) else '' }}
+
+- name: Recreate cv content dir
+  ansible.builtin.file:
+    path: "{{ cv_content_dir }}"
+    state: "{{ item }}"
+    mode: '0755'
+  loop:
+    - absent
+    - directory
+  when: cv_installed_version != cv_version
+
+- name: Download and extract cv release tarball
+  ansible.builtin.shell:
+    cmd: >-
+      set -euo pipefail;
+      curl -fsSL {{ cv_release_url | quote }} -o {{ cv_home }}/cv.tar.gz &&
+      tar -xzf {{ cv_home }}/cv.tar.gz -C {{ cv_content_dir }} &&
+      rm -f {{ cv_home }}/cv.tar.gz
+    executable: /bin/bash
+  when: cv_installed_version != cv_version
+  changed_when: true
+
+- name: Write cv version sentinel
+  ansible.builtin.copy:
+    content: "{{ cv_version }}\n"
+    dest: "{{ cv_version_sentinel }}"
+    mode: '0644'
+  when: cv_installed_version != cv_version
--- a/ansible/roles/devpi/defaults/main.yml
+++ b/ansible/roles/devpi/defaults/main.yml
@ -0,0 +1,21 @@
+---
+# devpi PyPI caching mirror (native launchd, replaces minikube StatefulSet)
+
+devpi_home: /Users/erichblume/devpi
+devpi_venv: "{{ devpi_home }}/venv"
+devpi_server_dir: "{{ devpi_home }}/server-dir"
+devpi_binary: "{{ devpi_venv }}/bin/devpi-server"
+devpi_init_binary: "{{ devpi_venv }}/bin/devpi-init"
+
+devpi_python_version: "3.12"
+devpi_server_version: "6.19.3"
+devpi_web_version: "5.0.2"
+
+devpi_host: 127.0.0.1
+devpi_port: 3141
+devpi_outside_url: "https://pypi.ops.eblu.me"
+
+devpi_log_dir: /Users/erichblume/Library/Logs
+
+# uv binary on indri — mise shim so version bumps via `mise upgrade uv` flow through transparently
+devpi_uv_binary: /Users/erichblume/.local/share/mise/shims/uv
--- a/ansible/roles/devpi/handlers/main.yml
+++ b/ansible/roles/devpi/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Restart devpi
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.devpi.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
+  changed_when: true
--- a/ansible/roles/devpi/tasks/main.yml
+++ b/ansible/roles/devpi/tasks/main.yml
@ -0,0 +1,71 @@
+---
+# devpi role — devpi-server in a uv-managed venv, run via LaunchAgent.
+# Replaces the prior minikube StatefulSet; see [[devpi-on-indri]].
+#
+# The root password is fetched in the indri.yml playbook pre_tasks and
+# exposed as `devpi_root_password`.
+
+- name: Ensure devpi home exists
+  ansible.builtin.file:
+    path: "{{ devpi_home }}"
+    state: directory
+    mode: '0755'
+
+- name: Ensure devpi server-dir exists
+  ansible.builtin.file:
+    path: "{{ devpi_server_dir }}"
+    state: directory
+    mode: '0700'
+
+- name: Create devpi venv if missing
+  ansible.builtin.command:
+    cmd: "{{ devpi_uv_binary }} venv --python {{ devpi_python_version }} {{ devpi_venv }}"
+    creates: "{{ devpi_venv }}/bin/python"
+
+- name: Install devpi-server and devpi-web into venv
+  # Always bootstrap from upstream PyPI — devpi is the index it would otherwise resolve through,
+  # and that's a circular dependency (devpi cannot install itself from itself).
+  ansible.builtin.command:
+    cmd: >-
+      {{ devpi_uv_binary }} pip install
+      --python {{ devpi_venv }}/bin/python
+      --index-url https://pypi.org/simple/
+      devpi-server=={{ devpi_server_version }}
+      devpi-web=={{ devpi_web_version }}
+  register: devpi_pip_install
+  changed_when: "'Installed' in devpi_pip_install.stdout or 'Uninstalled' in devpi_pip_install.stdout"
+  notify: Restart devpi
+
+- name: Check if devpi server-dir is initialized
+  ansible.builtin.stat:
+    path: "{{ devpi_server_dir }}/.serverversion"
+  register: devpi_serverversion
+
+- name: Initialize devpi server-dir
+  ansible.builtin.command:
+    cmd: >-
+      {{ devpi_init_binary }}
+      --serverdir {{ devpi_server_dir }}
+      --root-passwd {{ devpi_root_password }}
+  when: not devpi_serverversion.stat.exists
+  changed_when: true
+  no_log: true
+
+- name: Deploy devpi LaunchAgent plist
+  ansible.builtin.template:
+    src: devpi.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
+    mode: '0644'
+  notify: Restart devpi
+
+- name: Check if devpi LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.devpi
+  register: devpi_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load devpi LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
+  when: devpi_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
--- a/ansible/roles/devpi/templates/devpi.plist.j2
+++ b/ansible/roles/devpi/templates/devpi.plist.j2
@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>mcquack.eblume.devpi</string>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ devpi_binary }}</string>
+		<string>--serverdir</string>
+		<string>{{ devpi_server_dir }}</string>
+		<string>--host</string>
+		<string>{{ devpi_host }}</string>
+		<string>--port</string>
+		<string>{{ devpi_port }}</string>
+		<string>--outside-url</string>
+		<string>{{ devpi_outside_url }}</string>
+	</array>
+	<key>RunAtLoad</key>
+	<true/>
+	<key>KeepAlive</key>
+	<true/>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<key>PATH</key>
+		<string>{{ devpi_venv }}/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+	</dict>
+	<key>StandardOutPath</key>
+	<string>{{ devpi_log_dir }}/mcquack.devpi.out.log</string>
+	<key>StandardErrorPath</key>
+	<string>{{ devpi_log_dir }}/mcquack.devpi.err.log</string>
+</dict>
+</plist>
--- a/ansible/roles/docs/defaults/main.yml
+++ b/ansible/roles/docs/defaults/main.yml
@ -0,0 +1,10 @@
+---
+# Docs (Quartz-built static site) — replaces minikube Deployment.
+# Caddy serves docs_content_dir directly via the static-kind service block,
+# with Quartz-style try_files (path → path/ → path.html → 404).
+
+docs_version: "v1.17.0"
+docs_release_url: "https://forge.eblu.me/eblume/blumeops/releases/download/{{ docs_version }}/docs-{{ docs_version }}.tar.gz"
+docs_home: /Users/erichblume/blumeops/docs
+docs_content_dir: "{{ docs_home }}/content"
+docs_version_sentinel: "{{ docs_home }}/.installed-version"
--- a/ansible/roles/docs/tasks/main.yml
+++ b/ansible/roles/docs/tasks/main.yml
@ -0,0 +1,57 @@
+---
+# docs role — download and extract the Quartz-built docs tarball into
+# docs_content_dir. Caddy serves the directory directly with Quartz-style
+# try_files; there is no daemon to manage.
+#
+# Idempotency: a sentinel file records the installed docs_version. The
+# download/extract steps only run when the sentinel doesn't match docs_version.
+#
+# Mirrors the cv role's curl-based download for consistency, even though the
+# forge releases endpoint here does support HEAD.
+
+- name: Ensure docs home exists
+  ansible.builtin.file:
+    path: "{{ docs_home }}"
+    state: directory
+    mode: '0755'
+
+- name: Read installed docs version sentinel
+  ansible.builtin.slurp:
+    src: "{{ docs_version_sentinel }}"
+  register: docs_installed_raw
+  failed_when: false
+  changed_when: false
+
+- name: Set installed docs version fact
+  ansible.builtin.set_fact:
+    docs_installed_version: >-
+      {{ (docs_installed_raw.content | b64decode).strip()
+         if (docs_installed_raw.content is defined) else '' }}
+
+- name: Recreate docs content dir
+  ansible.builtin.file:
+    path: "{{ docs_content_dir }}"
+    state: "{{ item }}"
+    mode: '0755'
+  loop:
+    - absent
+    - directory
+  when: docs_installed_version != docs_version
+
+- name: Download and extract docs release tarball
+  ansible.builtin.shell:
+    cmd: >-
+      set -euo pipefail;
+      curl -fsSL {{ docs_release_url | quote }} -o {{ docs_home }}/docs.tar.gz &&
+      tar -xzf {{ docs_home }}/docs.tar.gz -C {{ docs_content_dir }} &&
+      rm -f {{ docs_home }}/docs.tar.gz
+    executable: /bin/bash
+  when: docs_installed_version != docs_version
+  changed_when: true
+
+- name: Write docs version sentinel
+  ansible.builtin.copy:
+    content: "{{ docs_version }}\n"
+    dest: "{{ docs_version_sentinel }}"
+    mode: '0644'
+  when: docs_installed_version != docs_version
--- a/ansible/roles/forgejo/defaults/main.yml
+++ b/ansible/roles/forgejo/defaults/main.yml
@ -4,22 +4,27 @@

 forgejo_app_name: Forgejo
 forgejo_app_slogan: "Beyond coding. We Forge."
-forgejo_run_user: forgejo
+forgejo_run_user: erichblume
 forgejo_run_mode: prod

-# Paths (brew-managed for now, will change to mcquack in Phase 3)
-forgejo_work_path: /opt/homebrew/var/forgejo
+# Source build paths
+forgejo_repo_dir: /Users/erichblume/code/3rd/forgejo
+forgejo_binary: "{{ forgejo_repo_dir }}/forgejo"
+
+# Data paths (migrated from brew to ~/forgejo)
+forgejo_work_path: /Users/erichblume/forgejo
 forgejo_config_path: "{{ forgejo_work_path }}/custom/conf/app.ini"
 forgejo_data_path: "{{ forgejo_work_path }}/data"
 forgejo_repo_root: "{{ forgejo_data_path }}/forgejo-repositories"
 forgejo_lfs_path: "{{ forgejo_data_path }}/lfs"
 forgejo_log_path: "{{ forgejo_work_path }}/log"
+forgejo_log_dir: /Users/erichblume/Library/Logs

 # Server settings
 forgejo_http_addr: 0.0.0.0
 forgejo_http_port: 3001
-forgejo_domain: forge.tail8d86e.ts.net
-forgejo_ssh_domain: "{{ forgejo_domain }}"
+forgejo_domain: forge.eblu.me
+forgejo_ssh_domain: forge.ops.eblu.me
 forgejo_root_url: "https://{{ forgejo_domain }}/"
 forgejo_offline_mode: true

@ -27,7 +32,7 @@ forgejo_offline_mode: true
 forgejo_disable_ssh: false
 forgejo_start_ssh_server: true
 forgejo_builtin_ssh_user: forgejo
-forgejo_ssh_port: 22
+forgejo_ssh_port: 2222
 forgejo_ssh_listen_port: 2200
 forgejo_lfs_start_server: true

--- a/ansible/roles/forgejo/handlers/main.yml
+++ b/ansible/roles/forgejo/handlers/main.yml
@ -1,4 +1,6 @@
 ---
 - name: Restart forgejo
-  ansible.builtin.command: brew services restart forgejo
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
  changed_when: true
--- a/ansible/roles/forgejo/tasks/main.yml
+++ b/ansible/roles/forgejo/tasks/main.yml
@ -1,16 +1,34 @@
 ---
-# Forgejo role
+# Forgejo role — source-built binary with LaunchAgent
 #
-# Currently uses brew-managed forgejo. Phase 3 of ci-cd-bootstrap will
-# transition to mcquack LaunchAgent with CI-built binary.
+# ONE-TIME SETUP (before running ansible):
+#
+# 1. Clone forgejo from codeberg (avoid circular dependency):
+#    ssh indri 'git clone https://codeberg.org/forgejo/forgejo.git ~/code/3rd/forgejo'
+#
+# 2. Add forge mirror as secondary remote:
+#    ssh indri 'cd ~/code/3rd/forgejo && git remote add forge https://forge.eblu.me/mirrors/forgejo.git'
+#
+# 3. Build (mise.toml handles Go/Node versions and build tags):
+#    ssh indri 'cd ~/code/3rd/forgejo && mise run build'
+#
+# 4. Run ansible to deploy config and LaunchAgent
 #
 # Secrets (lfs_jwt_secret, internal_token, oauth2_jwt_secret) are fetched
 # from 1Password in the playbook pre_tasks.

- name: Install forgejo via homebrew
-  community.general.homebrew:
-    name: forgejo
-    state: present
+- name: Verify forgejo binary exists
+  ansible.builtin.stat:
+    path: "{{ forgejo_binary }}"
+  register: forgejo_binary_stat
+
+- name: Fail if forgejo binary not found
+  ansible.builtin.fail:
+    msg: |
+      Forgejo binary not found at {{ forgejo_binary }}.
+      Please build from source first:
+        ssh indri 'cd ~/code/3rd/forgejo && mise run build'
+  when: not forgejo_binary_stat.stat.exists

 - name: Ensure forgejo config directory exists
  ansible.builtin.file:
@ -25,8 +43,21 @@
    mode: '0600'
  notify: Restart forgejo

- name: Ensure forgejo service is started
-  ansible.builtin.command: brew services start forgejo
-  register: forgejo_brew_start
-  changed_when: "'Successfully started' in forgejo_brew_start.stdout"
+- name: Deploy forgejo LaunchAgent plist
+  ansible.builtin.template:
+    src: forgejo.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
+    mode: '0644'
+  notify: Restart forgejo
+
+- name: Check if forgejo LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.forgejo
+  register: forgejo_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load forgejo LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
+  when: forgejo_launchctl_check.rc != 0
+  changed_when: true
  failed_when: false
--- a/ansible/roles/forgejo/templates/app.ini.j2
+++ b/ansible/roles/forgejo/templates/app.ini.j2
@ -20,6 +20,8 @@ SSH_LISTEN_PORT = {{ forgejo_ssh_listen_port }}
 LFS_START_SERVER = {{ forgejo_lfs_start_server | lower }}
 LFS_JWT_SECRET = {{ forgejo_lfs_jwt_secret }}
 OFFLINE_MODE = {{ forgejo_offline_mode | lower }}
+REVERSE_PROXY_LIMIT = 2
+REVERSE_PROXY_TRUSTED_PROXIES = *

 [database]
 DB_TYPE = {{ forgejo_db_type }}
@ -40,7 +42,7 @@ ENABLED = false
 REGISTER_EMAIL_CONFIRM = false
 ENABLE_NOTIFY_MAIL = false
 DISABLE_REGISTRATION = {{ forgejo_disable_registration | lower }}
-ALLOW_ONLY_EXTERNAL_REGISTRATION = false
+ALLOW_ONLY_EXTERNAL_REGISTRATION = true
 ENABLE_CAPTCHA = false
 REQUIRE_SIGNIN_VIEW = {{ forgejo_require_signin_view | lower }}
 DEFAULT_KEEP_EMAIL_PRIVATE = false
@ -52,9 +54,19 @@ NO_REPLY_ADDRESS = noreply.indri
 ENABLE_OPENID_SIGNIN = false
 ENABLE_OPENID_SIGNUP = false

+[mirror]
+DEFAULT_INTERVAL = 8h
+MIN_INTERVAL = 10m
+
 [cron.update_checker]
 ENABLED = false

+[cron.archive_cleanup]
+ENABLED = true
+RUN_AT_START = true
+SCHEDULE = @midnight
+OLDER_THAN = 2h
+
 [session]
 PROVIDER = {{ forgejo_session_provider }}

@ -77,6 +89,17 @@ PASSWORD_HASH_ALGO = pbkdf2_hi
 [oauth2]
 JWT_SECRET = {{ forgejo_oauth2_jwt_secret }}

+[oauth2_client]
+ENABLE_AUTO_REGISTRATION = true
+ACCOUNT_LINKING = login
+USERNAME = nickname
+REGISTER_EMAIL_CONFIRM = false
+
+[metrics]
+ENABLED = true
+ENABLED_ISSUE_BY_LABEL = false
+ENABLED_ISSUE_BY_REPOSITORY = false
+
 [actions]
 ENABLED = {{ forgejo_actions_enabled | lower }}
 DEFAULT_ACTIONS_URL = {{ forgejo_actions_default_url }}
--- a/ansible/roles/forgejo/templates/forgejo.plist.j2
+++ b/ansible/roles/forgejo/templates/forgejo.plist.j2
@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>mcquack.eblume.forgejo</string>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ forgejo_binary }}</string>
+		<string>-w</string>
+		<string>{{ forgejo_work_path }}</string>
+		<string>-c</string>
+		<string>{{ forgejo_config_path }}</string>
+		<string>web</string>
+	</array>
+	<key>RunAtLoad</key>
+	<true/>
+	<key>KeepAlive</key>
+	<true/>
+	<key>StandardOutPath</key>
+	<string>{{ forgejo_log_dir }}/mcquack.forgejo.out.log</string>
+	<key>StandardErrorPath</key>
+	<string>{{ forgejo_log_dir }}/mcquack.forgejo.err.log</string>
+</dict>
+</plist>
--- a/ansible/roles/forgejo_actions_secrets/defaults/main.yml
+++ b/ansible/roles/forgejo_actions_secrets/defaults/main.yml
@ -0,0 +1,24 @@
+---
+# Forgejo Actions Secrets role configuration
+#
+# This role syncs repository-level Actions secrets from 1Password to Forgejo
+# via the Forgejo API.
+
+forgejo_actions_secrets_api_url: "https://forge.eblu.me/api/v1"
+forgejo_actions_secrets_owner: eblume
+
+# Secrets to sync per repo.
+# Each entry: {repo: "name", secrets: [{name: "SECRET_NAME", value_var: "ansible_fact_name"}]}
+forgejo_actions_secrets_repos:
+  - repo: blumeops
+    secrets:
+      - name: ARGOCD_AUTH_TOKEN
+        value_var: forgejo_secret_argocd_token
+      - name: FLY_DEPLOY_TOKEN
+        value_var: forgejo_secret_fly_deploy_token
+      - name: ZOT_CI_API_KEY
+        value_var: forgejo_secret_zot_ci_api_key
+  - repo: cv
+    secrets:
+      - name: FORGE_TOKEN
+        value_var: forgejo_api_token
--- a/ansible/roles/forgejo_actions_secrets/tasks/main.yml
+++ b/ansible/roles/forgejo_actions_secrets/tasks/main.yml
@ -0,0 +1,32 @@
+---
+# Forgejo Actions Secrets role
+#
+# Syncs repository-level Actions secrets from 1Password to Forgejo via API.
+#
+# NOTE: This role runs on indri, which is also where Forgejo runs. The API
+# calls go from indri back to itself (via the public URL through Caddy).
+# This is intentional - it keeps the role simple and uses the same URL
+# that workflows use.
+#
+# Secrets (forgejo_api_token, forgejo_secret_*) are fetched from 1Password
+# in the playbook pre_tasks to minimize password prompts during provisioning.
+
+- name: Sync Actions secrets to Forgejo
+  ansible.builtin.uri:
+    url: "{{ forgejo_actions_secrets_api_url }}/repos/{{ forgejo_actions_secrets_owner }}/{{ item.0.repo }}/actions/secrets/{{ item.1.name }}"
+    method: PUT
+    headers:
+      Authorization: "token {{ forgejo_api_token }}"
+      Content-Type: "application/json"
+    body_format: json
+    body:
+      data: "{{ lookup('vars', item.1.value_var) }}"
+    status_code: [201, 204]
+  register: forgejo_actions_secrets_result
+  # API returns 201 for create, 204 for update. We can't check if value changed
+  # (secrets are write-only), so only report changed when creating new secrets.
+  changed_when: forgejo_actions_secrets_result.status == 201
+  loop: "{{ forgejo_actions_secrets_repos | subelements('secrets') }}"
+  loop_control:
+    label: "{{ item.0.repo }}/{{ item.1.name }}"
+  no_log: true
--- a/ansible/roles/forgejo_metrics/defaults/main.yml
+++ b/ansible/roles/forgejo_metrics/defaults/main.yml
@ -0,0 +1,20 @@
+---
+# Forgejo metrics collection configuration
+
+# Forgejo server URL
+forgejo_metrics_url: "http://localhost:3001"
+
+# Path to file containing Forgejo API token (should have 600 permissions)
+forgejo_metrics_api_key_file: "/Users/erichblume/.forgejo-api-key"
+
+# Metrics collection interval in seconds
+forgejo_metrics_interval: 60
+
+# Output directory for prometheus textfile collector
+forgejo_metrics_dir: /opt/homebrew/var/node_exporter/textfile
+
+# Script installation path
+forgejo_metrics_script: /Users/erichblume/.local/bin/forgejo-metrics
+
+# Log directory for metrics script output
+forgejo_metrics_log_dir: /opt/homebrew/var/log
--- a/ansible/roles/forgejo_metrics/handlers/main.yml
+++ b/ansible/roles/forgejo_metrics/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Reload forgejo-metrics
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
+  changed_when: true
--- a/ansible/roles/forgejo_metrics/tasks/main.yml
+++ b/ansible/roles/forgejo_metrics/tasks/main.yml
@ -0,0 +1,55 @@
+---
+- name: Fetch Forgejo API token (when running with --tags forgejo_metrics)
+  ansible.builtin.command:
+    cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/api-token"
+  delegate_to: localhost
+  register: forgejo_metrics_api_key_fallback
+  changed_when: false
+  no_log: true
+  check_mode: false
+  when: forgejo_metrics_api_key is not defined
+
+- name: Set Forgejo API token fact (fallback)
+  ansible.builtin.set_fact:
+    forgejo_metrics_api_key: "{{ forgejo_metrics_api_key_fallback.stdout }}"
+  no_log: true
+  when: forgejo_metrics_api_key is not defined
+
+- name: Write Forgejo API token file
+  ansible.builtin.copy:
+    content: "{{ forgejo_metrics_api_key }}"
+    dest: "{{ forgejo_metrics_api_key_file }}"
+    mode: '0600'
+  no_log: true
+
+- name: Ensure bin directory exists
+  ansible.builtin.file:
+    path: "{{ forgejo_metrics_script | dirname }}"
+    state: directory
+    mode: '0755'
+
+- name: Deploy forgejo metrics collection script
+  ansible.builtin.template:
+    src: forgejo-metrics.sh.j2
+    dest: "{{ forgejo_metrics_script }}"
+    mode: '0755'
+  notify: Reload forgejo-metrics
+
+- name: Deploy forgejo-metrics LaunchAgent plist
+  ansible.builtin.template:
+    src: forgejo-metrics.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
+    mode: '0644'
+  notify: Reload forgejo-metrics
+
+- name: Check if forgejo-metrics LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.forgejo-metrics
+  register: forgejo_metrics_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load forgejo-metrics LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
+  when: forgejo_metrics_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
--- a/ansible/roles/forgejo_metrics/templates/forgejo-metrics.plist.j2
+++ b/ansible/roles/forgejo_metrics/templates/forgejo-metrics.plist.j2
@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>mcquack.eblume.forgejo-metrics</string>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<key>PATH</key>
+		<string>/opt/homebrew/bin:/usr/bin:/bin</string>
+	</dict>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ forgejo_metrics_script }}</string>
+	</array>
+	<key>StartInterval</key>
+	<integer>{{ forgejo_metrics_interval }}</integer>
+	<key>RunAtLoad</key>
+	<true/>
+	<key>StandardErrorPath</key>
+	<string>{{ forgejo_metrics_log_dir }}/mcquack.forgejo-metrics.err.log</string>
+	<key>StandardOutPath</key>
+	<string>{{ forgejo_metrics_log_dir }}/mcquack.forgejo-metrics.out.log</string>
+</dict>
+</plist>
--- a/ansible/roles/forgejo_metrics/templates/forgejo-metrics.sh.j2
+++ b/ansible/roles/forgejo_metrics/templates/forgejo-metrics.sh.j2
@ -0,0 +1,162 @@
+#!/bin/bash
+# {{ ansible_managed }}
+# Collects Forgejo repository health metrics for node_exporter textfile collector
+
+set -euo pipefail
+
+FORGEJO_URL="{{ forgejo_metrics_url }}"
+API_KEY_FILE="{{ forgejo_metrics_api_key_file }}"
+OUTPUT_FILE="{{ forgejo_metrics_dir }}/forgejo.prom"
+TEMP_FILE="${OUTPUT_FILE}.tmp"
+
+TOKEN=$(cat "$API_KEY_FILE" 2>/dev/null | tr -d '\n' || true)
+
+# Authenticated API request; returns empty string on failure
+api() {
+    curl -sf -H "Authorization: token ${TOKEN}" -H "Accept: application/json" \
+        "${FORGEJO_URL}/api/v1${1}" 2>/dev/null || echo ""
+}
+
+# jq helper: convert ISO 8601 timestamp (with any tz offset) to epoch seconds
+# jq's fromdate only handles Z, so we parse the offset and apply it manually
+JQ_EPOCH='def epoch: sub("[.][0-9]+"; "") | if test("[+-][0-9]{2}:[0-9]{2}$") then capture("^(?<dt>.*)(?<sign>[+-])(?<h>[0-9]{2}):(?<m>[0-9]{2})$") | (.dt + "Z" | fromdate) as $base | ((.h | tonumber) * 3600 + (.m | tonumber) * 60) as $off | if .sign == "-" then $base + $off else $base - $off end else sub("Z$"; "") + "Z" | fromdate end;'
+
+forgejo_up=0
+if curl -sf "${FORGEJO_URL}/api/v1/version" >/dev/null 2>&1; then
+    forgejo_up=1
+fi
+
+{
+# --- Metric type declarations ---
+cat << 'HEADER'
+# HELP forgejo_up Forgejo server is up and responding
+# TYPE forgejo_up gauge
+# HELP forgejo_repo_open_pull_requests Number of open pull requests
+# TYPE forgejo_repo_open_pull_requests gauge
+# HELP forgejo_repo_open_issues Number of open issues
+# TYPE forgejo_repo_open_issues gauge
+# HELP forgejo_repo_language_bytes Repository language size in bytes
+# TYPE forgejo_repo_language_bytes gauge
+# HELP forgejo_repo_releases_total Total number of releases
+# TYPE forgejo_repo_releases_total gauge
+# HELP forgejo_repo_latest_release_timestamp_seconds Unix timestamp of the latest release
+# TYPE forgejo_repo_latest_release_timestamp_seconds gauge
+# HELP forgejo_repo_latest_commit_timestamp_seconds Unix timestamp of the latest commit on default branch
+# TYPE forgejo_repo_latest_commit_timestamp_seconds gauge
+# HELP forgejo_actions_runs_total Action runs by status from most recent 30
+# TYPE forgejo_actions_runs_total gauge
+# HELP forgejo_actions_run_duration_seconds Duration of the latest completed run per workflow in seconds
+# TYPE forgejo_actions_run_duration_seconds gauge
+# HELP forgejo_actions_last_success_timestamp_seconds Unix timestamp of last successful run per workflow
+# TYPE forgejo_actions_last_success_timestamp_seconds gauge
+# HELP forgejo_actions_jobs_waiting Number of action runs currently waiting or queued
+# TYPE forgejo_actions_jobs_waiting gauge
+# HELP forgejo_actions_jobs_running Number of action runs currently in progress
+# TYPE forgejo_actions_jobs_running gauge
+HEADER
+
+echo "forgejo_up ${forgejo_up}"
+
+if [ "$forgejo_up" -eq 1 ] && [ -n "$TOKEN" ]; then
+    # Discover all repos accessible to the token owner
+    repos_json=$(api "/repos/search?limit=50")
+    [ -z "$repos_json" ] && repos_json='{"data":[]}'
+
+    repo_count=$(echo "$repos_json" | jq '.data | length' 2>/dev/null || echo "0")
+
+    for i in $(seq 0 $((repo_count - 1))); do
+        repo_data=$(echo "$repos_json" | jq ".data[$i]")
+        full_name=$(echo "$repo_data" | jq -r '.full_name')
+        [ -z "$full_name" ] || [ "$full_name" = "null" ] && continue
+
+        r="$full_name"
+
+        # Basic repo metrics (from search results — no extra API call)
+        echo "forgejo_repo_open_pull_requests{repo=\"${r}\"} $(echo "$repo_data" | jq '.open_pr_counter // 0')"
+        echo "forgejo_repo_open_issues{repo=\"${r}\"} $(echo "$repo_data" | jq '.open_issues_count // 0')"
+
+        default_branch=$(echo "$repo_data" | jq -r '.default_branch // "main"')
+
+        # --- Languages ---
+        langs=$(api "/repos/${r}/languages")
+        if [ -n "$langs" ] && echo "$langs" | jq -e 'type == "object" and length > 0' >/dev/null 2>&1; then
+            echo "$langs" | jq -r --arg r "$r" \
+                'to_entries[] | "forgejo_repo_language_bytes{repo=\"\($r)\",language=\"\(.key)\"} \(.value)"' \
+                2>/dev/null || true
+        fi
+
+        # --- Releases ---
+        releases=$(api "/repos/${r}/releases?limit=50")
+        if [ -n "$releases" ] && echo "$releases" | jq -e 'type == "array"' >/dev/null 2>&1; then
+            echo "forgejo_repo_releases_total{repo=\"${r}\"} $(echo "$releases" | jq 'length')"
+            # Latest release timestamp and version
+            echo "$releases" | jq -r --arg r "$r" "${JQ_EPOCH}"'
+                if length > 0 then
+                    .[0] |
+                    "forgejo_repo_latest_release_timestamp_seconds{repo=\"\($r)\",version=\"\(.tag_name)\"} \((.published_at // .created_at // .created) | epoch)"
+                else empty end' 2>/dev/null || true
+        else
+            echo "forgejo_repo_releases_total{repo=\"${r}\"} 0"
+        fi
+
+        # --- Latest commit on default branch ---
+        commits=$(api "/repos/${r}/commits?limit=1&sha=${default_branch}")
+        if [ -n "$commits" ] && echo "$commits" | jq -e 'type == "array" and length > 0' >/dev/null 2>&1; then
+            echo "$commits" | jq -r --arg r "$r" "${JQ_EPOCH}"'
+                .[0] |
+                "forgejo_repo_latest_commit_timestamp_seconds{repo=\"\($r)\"} \((.created // .commit.committer.date) | epoch)"' \
+                2>/dev/null || true
+        fi
+
+        # --- Action runs ---
+        runs_json=$(api "/repos/${r}/actions/runs?limit=30")
+        if [ -n "$runs_json" ] && echo "$runs_json" | jq -e '.workflow_runs | type == "array"' >/dev/null 2>&1; then
+            # Count by status
+            echo "$runs_json" | jq -r --arg r "$r" '
+                .workflow_runs | group_by(.status) | .[] |
+                "forgejo_actions_runs_total{repo=\"\($r)\",status=\"\(.[0].status)\"} \(length)"' \
+                2>/dev/null || true
+
+            # Jobs waiting/running
+            waiting=$(echo "$runs_json" | jq '[.workflow_runs[] | select(.status == "waiting" or .status == "queued")] | length' 2>/dev/null || echo "0")
+            running=$(echo "$runs_json" | jq '[.workflow_runs[] | select(.status == "running")] | length' 2>/dev/null || echo "0")
+            echo "forgejo_actions_jobs_waiting{repo=\"${r}\"} ${waiting}"
+            echo "forgejo_actions_jobs_running{repo=\"${r}\"} ${running}"
+
+            # Discover current workflow files on the default branch (.forgejo/ or .github/)
+            current_wfs=""
+            for wf_dir in .forgejo/workflows .github/workflows; do
+                wf_list=$(api "/repos/${r}/contents/${wf_dir}?ref=${default_branch}")
+                if [ -n "$wf_list" ] && echo "$wf_list" | jq -e 'type == "array"' >/dev/null 2>&1; then
+                    current_wfs=$(echo "$wf_list" | jq -r '[.[].name] | join(",")' 2>/dev/null || true)
+                    break
+                fi
+            done
+
+            # Per-workflow: latest completed run duration and last success timestamp
+            # Only include workflows that currently exist on the default branch
+            # Forgejo fields: workflow_id (filename), created/stopped, duration (nanoseconds)
+            if [ -n "$current_wfs" ]; then
+                echo "$runs_json" | jq -r --arg r "$r" --arg wfs "$current_wfs" "${JQ_EPOCH}"'
+                    ($wfs | split(",")) as $current |
+                    [.workflow_runs[] | select((.status == "success" or .status == "failure") and (.workflow_id | IN($current[])))] |
+                    if length > 0 then
+                        group_by(.workflow_id) | .[] |
+                        (sort_by(.created) | reverse) as $sorted |
+                        ($sorted[0]) as $latest |
+                        ($latest.workflow_id | sub("[.]ya?ml$"; "")) as $wf |
+                        "forgejo_actions_run_duration_seconds{repo=\"\($r)\",workflow=\"\($wf)\"} \(($latest.duration // 0) / 1000000000 | floor)",
+                        ([$sorted[] | select(.status == "success")] |
+                            if length > 0 then
+                                .[0] as $last_ok |
+                                "forgejo_actions_last_success_timestamp_seconds{repo=\"\($r)\",workflow=\"\($wf)\"} \($last_ok.stopped | epoch)"
+                            else empty end)
+                    else empty end' 2>/dev/null || true
+            fi
+        fi
+    done
+fi
+} > "$TEMP_FILE"
+
+# Atomic move
+mv "$TEMP_FILE" "$OUTPUT_FILE"
--- a/ansible/roles/heph/defaults/main.yml
+++ b/ansible/roles/heph/defaults/main.yml
@ -0,0 +1,49 @@
+---
+# hephaestus hub — the canonical heph replica (server mode) on indri.
+# Other devices (e.g. gilbert) are spokes that sync against this hub.
+# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo.
+
+# Pinned release used for the initial `cargo install` and the PWA shell.
+# After bootstrap, hephd's own --self-update keeps the binary current; this
+# pin only governs the first install and the bundled PWA shell version.
+heph_version: v1.2.1
+
+# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial
+# install and unattended self-update build from the same source (no ssh-agent).
+heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git
+
+heph_bin_dir: /Users/erichblume/.cargo/bin
+heph_binary: "{{ heph_bin_dir }}/hephd"
+
+# rustc/cargo here are rustup shims. The bare (non-mise) environment that the
+# launchagent and ansible run in falls back to rustup's *default* toolchain,
+# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the
+# channel explicitly so both the bootstrap build and unattended self-update
+# always use a current toolchain regardless of the host's rustup default.
+heph_rust_toolchain: stable
+
+heph_data_dir: /Users/erichblume/.local/share/heph
+heph_db: "{{ heph_data_dir }}/heph.db"
+heph_socket: "{{ heph_data_dir }}/hephd.sock"
+heph_log_dir: /Users/erichblume/Library/Logs
+
+# Version-pinned source checkout; the PWA static shell is served directly from
+# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version.
+heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src
+heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa"
+
+# Hub listens on all interfaces so tailnet spokes can reach it directly
+# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me.
+# Access is gated by Authentik OIDC regardless — tailnet reachability is not
+# enough (this is the owner's most sensitive data).
+heph_http_addr: 0.0.0.0:8787
+heph_port: 8787
+heph_external_url: https://heph.ops.eblu.me
+
+# Authentik OIDC — issuer + audience together turn hub auth on. The audience is
+# the device-code client id (see argocd/manifests/authentik heph blueprint).
+heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/
+heph_oidc_audience: heph
+
+# Self-update poll interval (seconds). 10 minutes.
+heph_self_update_interval_secs: 600
--- a/ansible/roles/heph/handlers/main.yml
+++ b/ansible/roles/heph/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Restart heph
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
+  changed_when: true
--- a/ansible/roles/heph/tasks/main.yml
+++ b/ansible/roles/heph/tasks/main.yml
@ -0,0 +1,82 @@
+---
+# hephaestus hub (server mode) on indri.
+#
+# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub
+# adopts gilbert's existing data instead of being born empty):
+#
+#   1. On the seed device (gilbert):   heph daemon stop
+#   2. Copy its store to indri:         scp ~/.local/share/heph/heph.db \
+#                                           indri:~/.local/share/heph/heph.db
+#   3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id +
+#      data; hephd regenerates a fresh origin on next start when it is missing):
+#        sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';"
+#   4. Run this role (installs hephd, stages the PWA, loads the launchagent).
+#
+# hephd auto-creates an empty store on first start if none exists, so seeding is
+# optional — skip it only if you intend a fresh, empty hub.
+
+- name: Ensure heph data directory exists
+  ansible.builtin.file:
+    path: "{{ heph_data_dir }}"
+    state: directory
+    mode: '0700'
+
+- name: Check for installed hephd binary
+  ansible.builtin.stat:
+    path: "{{ heph_binary }}"
+  register: heph_binary_stat
+
+# Bootstrap install only when hephd is absent. Thereafter hephd's own
+# --self-update keeps it current; ansible must not fight (or downgrade) it.
+# This builds from source and can take several minutes on a cold cargo cache.
+- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }})
+  ansible.builtin.command:
+    cmd: >-
+      {{ heph_bin_dir }}/cargo install --locked
+      --git {{ heph_repo_url }}
+      --tag {{ heph_version }}
+      heph hephd
+  environment:
+    PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
+    RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}"
+  when: not heph_binary_stat.stat.exists
+  changed_when: true
+  notify: Restart heph
+
+# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir),
+# served directly by hephd. Static files are read from disk per request, so a
+# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN")
+# evicts stale assets on next load.
+- name: Ensure heph cache parent directory exists
+  ansible.builtin.file:
+    path: "{{ heph_pwa_src_dir | dirname }}"
+    state: directory
+    mode: '0755'
+
+- name: Stage heph-pwa source at {{ heph_version }}
+  ansible.builtin.git:
+    repo: "{{ heph_repo_url }}"
+    dest: "{{ heph_pwa_src_dir }}"
+    version: "{{ heph_version }}"
+    depth: 1
+    single_branch: true
+    force: true
+
+- name: Deploy heph LaunchAgent plist
+  ansible.builtin.template:
+    src: heph.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist
+    mode: '0644'
+  notify: Restart heph
+
+- name: Check if heph LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.heph
+  register: heph_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load heph LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
+  when: heph_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
--- a/ansible/roles/heph/templates/heph.plist.j2
+++ b/ansible/roles/heph/templates/heph.plist.j2
@ -0,0 +1,50 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>mcquack.eblume.heph</string>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ heph_binary }}</string>
+		<string>--mode</string>
+		<string>server</string>
+		<string>--http-addr</string>
+		<string>{{ heph_http_addr }}</string>
+		<string>--db</string>
+		<string>{{ heph_db }}</string>
+		<string>--socket</string>
+		<string>{{ heph_socket }}</string>
+		<string>--web-root</string>
+		<string>{{ heph_web_root }}</string>
+		<string>--oidc-issuer</string>
+		<string>{{ heph_oidc_issuer }}</string>
+		<string>--oidc-audience</string>
+		<string>{{ heph_oidc_audience }}</string>
+		<string>--self-update</string>
+		<string>--self-update-interval-secs</string>
+		<string>{{ heph_self_update_interval_secs }}</string>
+	</array>
+	<key>RunAtLoad</key>
+	<true/>
+	<key>KeepAlive</key>
+	<true/>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<!-- cargo + toolchain on PATH so --self-update can run `cargo install`. -->
+		<key>PATH</key>
+		<string>{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+		<key>HOME</key>
+		<string>/Users/erichblume</string>
+		<!-- Pin the rustup channel: the launchagent runs without mise, so a bare
+		     cargo shim would otherwise use rustup's (stale) default toolchain. -->
+		<key>RUSTUP_TOOLCHAIN</key>
+		<string>{{ heph_rust_toolchain }}</string>
+	</dict>
+	<key>StandardOutPath</key>
+	<string>{{ heph_log_dir }}/mcquack.heph.out.log</string>
+	<key>StandardErrorPath</key>
+	<string>{{ heph_log_dir }}/mcquack.heph.err.log</string>
+</dict>
+</plist>
--- a/ansible/roles/jellyfin/defaults/main.yml
+++ b/ansible/roles/jellyfin/defaults/main.yml
@ -0,0 +1,30 @@
+---
+# Jellyfin media server configuration
+
+# Port Jellyfin listens on
+jellyfin_port: 8096
+
+# Data directory (standard macOS location)
+jellyfin_data_dir: "{{ ansible_env.HOME }}/Library/Application Support/jellyfin"
+
+# Media path (NFS mount from sifaka)
+jellyfin_media_path: /Volumes/allisonflix
+
+# Homebrew cask application path
+jellyfin_cask_app_path: /Applications/Jellyfin.app
+
+# Binary path inside the cask app
+jellyfin_binary: "{{ jellyfin_cask_app_path }}/Contents/MacOS/jellyfin"
+
+# Web client path (different from binary location in Homebrew cask)
+jellyfin_webdir: "{{ jellyfin_cask_app_path }}/Contents/Resources/jellyfin-web"
+
+# Log directory
+jellyfin_log_dir: "{{ ansible_env.HOME }}/Library/Logs"
+
+# SSO plugin configuration
+jellyfin_sso_plugin_version: "4.0.0.3"
+jellyfin_sso_client_id: jellyfin
+jellyfin_sso_client_secret: ""
+jellyfin_sso_provider_name: authentik
+jellyfin_plugins_dir: "{{ jellyfin_data_dir }}/plugins"
--- a/ansible/roles/jellyfin/handlers/main.yml
+++ b/ansible/roles/jellyfin/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Reload jellyfin
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist
+  changed_when: true
--- a/ansible/roles/jellyfin/tasks/main.yml
+++ b/ansible/roles/jellyfin/tasks/main.yml
@ -0,0 +1,77 @@
+---
+- name: Install Jellyfin via Homebrew cask
+  community.general.homebrew_cask:
+    name: jellyfin
+    state: present
+
+- name: Ensure Jellyfin data directory exists
+  ansible.builtin.file:
+    path: "{{ jellyfin_data_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Deploy Jellyfin LaunchAgent plist
+  ansible.builtin.template:
+    src: mcquack.jellyfin.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.jellyfin.plist
+    mode: '0644'
+  notify: Reload jellyfin
+
+- name: Check if Jellyfin LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.jellyfin
+  register: jellyfin_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load Jellyfin LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.jellyfin.plist
+  when: jellyfin_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
+
+# SSO plugin installation
+- name: Ensure SSO-Auth plugin directory exists
+  ansible.builtin.file:
+    path: "{{ jellyfin_plugins_dir }}/SSO-Auth_{{ jellyfin_sso_plugin_version }}"
+    state: directory
+    mode: '0755'
+
+- name: Download SSO-Auth plugin archive
+  ansible.builtin.get_url:
+    url: "https://github.com/9p4/jellyfin-plugin-sso/releases/download/v{{ jellyfin_sso_plugin_version }}/sso-authentication_{{ jellyfin_sso_plugin_version }}.zip"
+    dest: "/tmp/sso-authentication_{{ jellyfin_sso_plugin_version }}.zip"
+    mode: '0644'
+
+- name: Extract SSO-Auth plugin
+  ansible.builtin.unarchive:
+    src: "/tmp/sso-authentication_{{ jellyfin_sso_plugin_version }}.zip"
+    dest: "{{ jellyfin_plugins_dir }}/SSO-Auth_{{ jellyfin_sso_plugin_version }}"
+    remote_src: true
+  notify: Reload jellyfin
+
+- name: Ensure plugin configurations directory exists
+  ansible.builtin.file:
+    path: "{{ jellyfin_plugins_dir }}/configurations"
+    state: directory
+    mode: '0755'
+
+- name: Deploy SSO-Auth plugin configuration
+  ansible.builtin.template:
+    src: sso-auth.xml.j2
+    dest: "{{ jellyfin_plugins_dir }}/configurations/SSO-Auth.xml"
+    mode: '0644'
+  notify: Reload jellyfin
+
+# Branding — add SSO login button to login page
+- name: Ensure Jellyfin config directory exists
+  ansible.builtin.file:
+    path: "{{ jellyfin_data_dir }}/config"
+    state: directory
+    mode: '0755'
+
+- name: Deploy Jellyfin branding configuration
+  ansible.builtin.template:
+    src: branding.xml.j2
+    dest: "{{ jellyfin_data_dir }}/config/branding.xml"
+    mode: '0644'
+  notify: Reload jellyfin
--- a/ansible/roles/jellyfin/templates/branding.xml.j2
+++ b/ansible/roles/jellyfin/templates/branding.xml.j2
@ -0,0 +1,7 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- {{ ansible_managed }} -->
+<BrandingOptions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+  <LoginDisclaimer>&lt;form action="/sso/OID/start/{{ jellyfin_sso_provider_name }}"&gt;&lt;button class="raised block emby-button button-submit" type="submit" style="margin:2em 0"&gt;Sign in with Authentik&lt;/button&gt;&lt;/form&gt;</LoginDisclaimer>
+  <CustomCss />
+  <SplashscreenEnabled>false</SplashscreenEnabled>
+</BrandingOptions>
--- a/ansible/roles/jellyfin/templates/mcquack.jellyfin.plist.j2
+++ b/ansible/roles/jellyfin/templates/mcquack.jellyfin.plist.j2
@ -0,0 +1,33 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- {{ ansible_managed }} -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>mcquack.jellyfin</string>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<key>PATH</key>
+		<string>/opt/homebrew/bin:/usr/bin:/bin</string>
+	</dict>
+	<key>ProgramArguments</key>
+	<array>
+		<string>{{ jellyfin_binary }}</string>
+		<string>--service</string>
+		<string>--datadir</string>
+		<string>{{ jellyfin_data_dir }}</string>
+		<string>--webdir</string>
+		<string>{{ jellyfin_webdir }}</string>
+	</array>
+	<key>WorkingDirectory</key>
+	<string>{{ jellyfin_data_dir }}</string>
+	<key>RunAtLoad</key>
+	<true/>
+	<key>KeepAlive</key>
+	<true/>
+	<key>StandardErrorPath</key>
+	<string>{{ jellyfin_log_dir }}/mcquack.jellyfin.err.log</string>
+	<key>StandardOutPath</key>
+	<string>{{ jellyfin_log_dir }}/mcquack.jellyfin.out.log</string>
+</dict>
+</plist>
--- a/ansible/roles/jellyfin/templates/sso-auth.xml.j2
+++ b/ansible/roles/jellyfin/templates/sso-auth.xml.j2
@ -0,0 +1,33 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- {{ ansible_managed }} -->
+<PluginConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+  <SamlConfigs />
+  <OidConfigs>
+    <item>
+      <key><string>{{ jellyfin_sso_provider_name }}</string></key>
+      <value>
+        <PluginConfiguration>
+          <OidEndpoint>https://authentik.ops.eblu.me/application/o/jellyfin</OidEndpoint>
+          <OidClientId>{{ jellyfin_sso_client_id }}</OidClientId>
+          <OidSecret>{{ jellyfin_sso_client_secret }}</OidSecret>
+          <Enabled>true</Enabled>
+          <EnableAuthorization>true</EnableAuthorization>
+          <EnableAllFolders>true</EnableAllFolders>
+          <EnabledFolders />
+          <AdminRoles><string>admins</string></AdminRoles>
+          <Roles />
+          <EnableFolderRoles>false</EnableFolderRoles>
+          <FolderRoleMappings />
+          <RoleClaim>groups</RoleClaim>
+          <OidScopes>
+            <string>openid</string>
+            <string>email</string>
+            <string>profile</string>
+          </OidScopes>
+          <SchemeOverride>https</SchemeOverride>
+          <CanonicalLinks />
+        </PluginConfiguration>
+      </value>
+    </item>
+  </OidConfigs>
+</PluginConfiguration>
--- a/ansible/roles/jellyfin_metrics/defaults/main.yml
+++ b/ansible/roles/jellyfin_metrics/defaults/main.yml
@ -0,0 +1,20 @@
+---
+# Jellyfin metrics collection configuration
+
+# Jellyfin server URL
+jellyfin_metrics_url: "http://localhost:8096"
+
+# Path to file containing Jellyfin API key (should have 600 permissions)
+jellyfin_metrics_api_key_file: "/Users/erichblume/.jellyfin-api-key"
+
+# Metrics collection interval in seconds
+jellyfin_metrics_interval: 60
+
+# Output directory for prometheus textfile collector
+jellyfin_metrics_dir: /opt/homebrew/var/node_exporter/textfile
+
+# Script installation path
+jellyfin_metrics_script: /Users/erichblume/.local/bin/jellyfin-metrics
+
+# Log directory for metrics script output
+jellyfin_metrics_log_dir: /opt/homebrew/var/log
--- a/ansible/roles/jellyfin_metrics/handlers/main.yml
+++ b/ansible/roles/jellyfin_metrics/handlers/main.yml
@ -0,0 +1,6 @@
+---
+- name: Reload jellyfin-metrics
+  ansible.builtin.shell: |
+    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.jellyfin-metrics.plist 2>/dev/null || true
+    launchctl load ~/Library/LaunchAgents/mcquack.eblume.jellyfin-metrics.plist
+  changed_when: true
--- a/ansible/roles/jellyfin_metrics/tasks/main.yml
+++ b/ansible/roles/jellyfin_metrics/tasks/main.yml
@ -0,0 +1,55 @@
+---
+- name: Fetch Jellyfin API key (when running with --tags jellyfin_metrics)
+  ansible.builtin.command:
+    cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/ceywxkcd3z7najsy2nmmbs2vke/credential"
+  delegate_to: localhost
+  register: jellyfin_metrics_api_key_fallback
+  changed_when: false
+  no_log: true
+  check_mode: false
+  when: jellyfin_metrics_api_key is not defined
+
+- name: Set Jellyfin API key fact (fallback)
+  ansible.builtin.set_fact:
+    jellyfin_metrics_api_key: "{{ jellyfin_metrics_api_key_fallback.stdout }}"
+  no_log: true
+  when: jellyfin_metrics_api_key is not defined
+
+- name: Write Jellyfin API key file
+  ansible.builtin.copy:
+    content: "{{ jellyfin_metrics_api_key }}"
+    dest: "{{ jellyfin_metrics_api_key_file }}"
+    mode: '0600'
+  no_log: true
+
+- name: Ensure bin directory exists
+  ansible.builtin.file:
+    path: "{{ jellyfin_metrics_script | dirname }}"
+    state: directory
+    mode: '0755'
+
+- name: Deploy jellyfin metrics collection script
+  ansible.builtin.template:
+    src: jellyfin-metrics.sh.j2
+    dest: "{{ jellyfin_metrics_script }}"
+    mode: '0755'
+  notify: Reload jellyfin-metrics
+
+- name: Deploy jellyfin-metrics LaunchAgent plist
+  ansible.builtin.template:
+    src: jellyfin-metrics.plist.j2
+    dest: ~/Library/LaunchAgents/mcquack.eblume.jellyfin-metrics.plist
+    mode: '0644'
+  notify: Reload jellyfin-metrics
+
+- name: Check if jellyfin-metrics LaunchAgent is loaded
+  ansible.builtin.command: launchctl list mcquack.eblume.jellyfin-metrics
+  register: jellyfin_metrics_launchctl_check
+  changed_when: false
+  failed_when: false
+
+- name: Load jellyfin-metrics LaunchAgent if not loaded
+  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.jellyfin-metrics.plist
+  when: jellyfin_metrics_launchctl_check.rc != 0
+  changed_when: true
+  failed_when: false
--- a/ansible/roles/jellyfin_metrics/templates/jellyfin-metrics.plist.j2
+++ b/ansible/roles/jellyfin_metrics/templates/jellyfin-metrics.plist.j2
@ -4,7 +4,7 @@
 <plist version="1.0">
 <dict>
 	<key>Label</key>
-	<string>mcquack.eblume.plex-metrics</string>
+	<string>mcquack.eblume.jellyfin-metrics</string>
 	<key>EnvironmentVariables</key>
 	<dict>
 		<key>PATH</key>
@ -12,15 +12,15 @@
 	</dict>
 	<key>ProgramArguments</key>
 	<array>
-		<string>{{ plex_metrics_script }}</string>
+		<string>{{ jellyfin_metrics_script }}</string>
 	</array>
 	<key>StartInterval</key>
-	<integer>{{ plex_metrics_interval }}</integer>
+	<integer>{{ jellyfin_metrics_interval }}</integer>
 	<key>RunAtLoad</key>
 	<true/>
 	<key>StandardErrorPath</key>
-	<string>{{ plex_metrics_log_dir }}/plex-metrics.err.log</string>
+	<string>{{ jellyfin_metrics_log_dir }}/jellyfin-metrics.err.log</string>
 	<key>StandardOutPath</key>
-	<string>{{ plex_metrics_log_dir }}/plex-metrics.out.log</string>
+	<string>{{ jellyfin_metrics_log_dir }}/jellyfin-metrics.out.log</string>
 </dict>
 </plist>
--- a/ansible/roles/jellyfin_metrics/templates/jellyfin-metrics.sh.j2
+++ b/ansible/roles/jellyfin_metrics/templates/jellyfin-metrics.sh.j2
@ -0,0 +1,137 @@
+#!/bin/bash
+# {{ ansible_managed }}
+# Collects Jellyfin Media Server metrics for node_exporter textfile collector
+
+set -euo pipefail
+
+JELLYFIN_URL="{{ jellyfin_metrics_url }}"
+API_KEY_FILE="{{ jellyfin_metrics_api_key_file }}"
+OUTPUT_FILE="{{ jellyfin_metrics_dir }}/jellyfin.prom"
+TEMP_FILE="${OUTPUT_FILE}.tmp"
+
+# Read API key from file
+get_api_key() {
+    if [ -f "$API_KEY_FILE" ]; then
+        cat "$API_KEY_FILE" | tr -d '\n'
+    else
+        echo ""
+    fi
+}
+
+# Make API request with optional API key
+api_request() {
+    local endpoint="$1"
+    local use_auth="${2:-true}"
+    local api_key
+    local url="${JELLYFIN_URL}${endpoint}"
+
+    if [ "$use_auth" = "true" ]; then
+        api_key=$(get_api_key)
+        if [ -n "$api_key" ]; then
+            curl -s -H "Accept: application/json" -H "X-Emby-Token: $api_key" "$url" 2>/dev/null
+        else
+            curl -s -H "Accept: application/json" "$url" 2>/dev/null
+        fi
+    else
+        curl -s -H "Accept: application/json" "$url" 2>/dev/null
+    fi
+}
+
+# Initialize metrics
+jellyfin_up=0
+jellyfin_version=""
+jellyfin_sessions_total=0
+jellyfin_sessions_playing=0
+jellyfin_sessions_paused=0
+jellyfin_transcode_sessions_total=0
+
+# Library metrics will be built dynamically
+library_metrics=""
+
+# Check server health (no auth required)
+health=$(api_request "/health" false)
+if [ "$health" = "Healthy" ]; then
+    jellyfin_up=1
+fi
+
+# Get system info for version (requires auth)
+if [ "$jellyfin_up" -eq 1 ] && [ -f "$API_KEY_FILE" ]; then
+    system_info=$(api_request "/System/Info")
+    if [ -n "$system_info" ]; then
+        jellyfin_version=$(echo "$system_info" | jq -r '.Version // ""')
+    fi
+
+    # Get library counts (virtual folders)
+    libraries=$(api_request "/Library/VirtualFolders")
+    if [ -n "$libraries" ] && echo "$libraries" | jq -e '.' > /dev/null 2>&1; then
+        # Process each library
+        while IFS=$'\t' read -r lib_name lib_type lib_id; do
+            if [ -n "$lib_name" ] && [ -n "$lib_type" ]; then
+                # Get item count for this library
+                # Map collection type to item type for counting
+                case "$lib_type" in
+                    movies) item_type="Movie" ;;
+                    tvshows) item_type="Series" ;;
+                    music) item_type="MusicAlbum" ;;
+                    *) item_type="" ;;
+                esac
+
+                if [ -n "$item_type" ] && [ -n "$lib_id" ]; then
+                    items=$(api_request "/Items?parentId=${lib_id}&recursive=true&includeItemTypes=${item_type}&limit=0")
+                    item_count=$(echo "$items" | jq -r '.TotalRecordCount // 0' 2>/dev/null || echo "0")
+                    library_metrics="${library_metrics}jellyfin_library_items{library=\"${lib_name}\",type=\"${lib_type}\"} ${item_count}
+"
+                fi
+            fi
+        done < <(echo "$libraries" | jq -r '.[] | [.Name, .CollectionType, .ItemId] | @tsv' 2>/dev/null || true)
+    fi
+
+    # Get active sessions
+    sessions=$(api_request "/Sessions")
+    if [ -n "$sessions" ] && echo "$sessions" | jq -e '.' > /dev/null 2>&1; then
+        jellyfin_sessions_total=$(echo "$sessions" | jq -r 'length')
+
+        # Count playing sessions (NowPlayingItem is present and IsPaused is false)
+        jellyfin_sessions_playing=$(echo "$sessions" | jq -r '[.[] | select(.NowPlayingItem != null and .PlayState.IsPaused == false)] | length')
+
+        # Count paused sessions
+        jellyfin_sessions_paused=$(echo "$sessions" | jq -r '[.[] | select(.NowPlayingItem != null and .PlayState.IsPaused == true)] | length')
+
+        # Count transcode sessions (TranscodingInfo is present)
+        jellyfin_transcode_sessions_total=$(echo "$sessions" | jq -r '[.[] | select(.TranscodingInfo != null)] | length')
+    fi
+fi
+
+# Write metrics
+cat > "$TEMP_FILE" << EOF
+# HELP jellyfin_up Jellyfin Media Server is up and responding
+# TYPE jellyfin_up gauge
+jellyfin_up ${jellyfin_up}
+
+# HELP jellyfin_version_info Jellyfin Media Server version information
+# TYPE jellyfin_version_info gauge
+jellyfin_version_info{version="${jellyfin_version}"} 1
+
+# HELP jellyfin_sessions_total Total number of active Jellyfin sessions
+# TYPE jellyfin_sessions_total gauge
+jellyfin_sessions_total ${jellyfin_sessions_total}
+
+# HELP jellyfin_sessions_playing Number of sessions currently playing
+# TYPE jellyfin_sessions_playing gauge
+jellyfin_sessions_playing ${jellyfin_sessions_playing}
+
+# HELP jellyfin_sessions_paused Number of sessions currently paused
+# TYPE jellyfin_sessions_paused gauge
+jellyfin_sessions_paused ${jellyfin_sessions_paused}
+
+# HELP jellyfin_transcode_sessions_total Number of sessions being transcoded
+# TYPE jellyfin_transcode_sessions_total gauge
+jellyfin_transcode_sessions_total ${jellyfin_transcode_sessions_total}
+
+# HELP jellyfin_library_items Number of items in each Jellyfin library
+# TYPE jellyfin_library_items gauge
+${library_metrics}
+EOF
+
+# Atomic move
+mv "$TEMP_FILE" "$OUTPUT_FILE"
--- a/ansible/roles/minikube/tasks/main.yml
+++ b/ansible/roles/minikube/tasks/main.yml
@ -37,9 +37,9 @@
    msg: "WARNING: Docker does not appear to be running. Please start Docker Desktop manually."
  when: minikube_docker_status.rc != 0

- name: Check if minikube cluster exists
+- name: Check minikube cluster status
  ansible.builtin.command:
-    cmd: minikube status --format={% raw %}'{{.Host}}'{% endraw %}
+    cmd: minikube status
  register: minikube_status
  changed_when: false
  failed_when: false
@ -63,11 +63,11 @@
  failed_when: false  # Don't fail - may need manual intervention
  when:
    - minikube_docker_status.rc == 0
-    - minikube_status.rc != 0 or 'Running' not in minikube_status.stdout
+    - minikube_status.rc != 0

 - name: Check minikube status after start attempt
  ansible.builtin.command:
-    cmd: minikube status --format={% raw %}'{{.Host}}'{% endraw %}
+    cmd: minikube status
  register: minikube_final_status
  changed_when: false
  failed_when: false
@ -75,7 +75,38 @@
 - name: Warn if minikube failed to start
  ansible.builtin.debug:
    msg: "WARNING: minikube may not have started properly. Run 'minikube start' manually on indri if needed. Status: {{ minikube_final_status.stdout | default('unknown') }}"
-  when: minikube_final_status.rc != 0 or 'Running' not in minikube_final_status.stdout
+  when: minikube_final_status.rc != 0
+
+# The storage-provisioner is a bare Pod (no controller). If the node restarts
+# via Docker Desktop rather than `minikube start`, kubelet brings back static
+# pods (apiserver, etcd) but bare pods like storage-provisioner are lost.
+# `minikube start` on a running cluster is safe and re-applies all addons.
+- name: Check storage-provisioner pod is running
+  ansible.builtin.command:
+    cmd: kubectl -n kube-system get pod storage-provisioner -o jsonpath='{.status.phase}'
+  register: minikube_storage_provisioner
+  changed_when: false
+  failed_when: false
+  when: minikube_final_status.rc == 0
+
+- name: Re-run minikube start to restore addons
+  ansible.builtin.command:
+    cmd: >
+      minikube start
+      --driver={{ minikube_driver }}
+      --container-runtime={{ minikube_container_runtime }}
+      --cpus={{ minikube_cpus }}
+      --memory={{ minikube_memory }}
+      --disk-size={{ minikube_disk_size }}
+      {% for name in minikube_apiserver_names %}
+      --apiserver-names={{ name }}
+      {% endfor %}
+      --apiserver-port={{ minikube_apiserver_port }}
+      --listen-address={{ minikube_listen_address }}
+  when:
+    - minikube_final_status.rc == 0
+    - minikube_storage_provisioner.stdout | default('') != 'Running'
+  changed_when: true

 # Configure containerd to use zot registry as pull-through cache
 # With docker driver, use host.minikube.internal to reach the host
@ -85,32 +116,32 @@
  ansible.builtin.command:
    cmd: minikube ssh --native-ssh=false "sudo mkdir -p /etc/containerd/certs.d/{{ item }}"
  loop:
-    - registry.tail8d86e.ts.net
+    - registry.ops.eblu.me
    - docker.io
    - ghcr.io
    - quay.io
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

-# Private registry (registry.tail8d86e.ts.net) - direct to zot
- name: Check registry.tail8d86e.ts.net config
+# Private registry (registry.ops.eblu.me) - direct to zot
+- name: Check registry.ops.eblu.me config
  ansible.builtin.command:
-    cmd: minikube ssh --native-ssh=false "cat /etc/containerd/certs.d/registry.tail8d86e.ts.net/hosts.toml 2>/dev/null || echo ''"
+    cmd: minikube ssh --native-ssh=false "cat /etc/containerd/certs.d/registry.ops.eblu.me/hosts.toml 2>/dev/null || echo ''"
  register: minikube_registry_config
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

- name: Configure registry.tail8d86e.ts.net mirror
+- name: Configure registry.ops.eblu.me mirror
  ansible.builtin.command:
    cmd: |
      minikube ssh --native-ssh=false 'echo "server = \"http://host.minikube.internal:5050\"

      [host.\"http://host.minikube.internal:5050\"]
        capabilities = [\"pull\", \"resolve\", \"push\"]
-        skip_verify = true" | sudo tee /etc/containerd/certs.d/registry.tail8d86e.ts.net/hosts.toml'
+        skip_verify = true" | sudo tee /etc/containerd/certs.d/registry.ops.eblu.me/hosts.toml'
  changed_when: true
  when:
-    - minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+    - minikube_final_status.rc == 0
    - "'host.minikube.internal:5050' not in minikube_registry_config.stdout"
  notify: Restart containerd in minikube

@ -120,7 +151,7 @@
    cmd: minikube ssh --native-ssh=false "cat /etc/containerd/certs.d/docker.io/hosts.toml 2>/dev/null || echo ''"
  register: minikube_dockerio_config
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

 - name: Configure docker.io mirror through zot
  ansible.builtin.command:
@ -132,7 +163,7 @@
        skip_verify = true" | sudo tee /etc/containerd/certs.d/docker.io/hosts.toml'
  changed_when: true
  when:
-    - minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+    - minikube_final_status.rc == 0
    - "'host.minikube.internal:5050' not in minikube_dockerio_config.stdout"
  notify: Restart containerd in minikube

@ -142,7 +173,7 @@
    cmd: minikube ssh --native-ssh=false "cat /etc/containerd/certs.d/ghcr.io/hosts.toml 2>/dev/null || echo ''"
  register: minikube_ghcr_config
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

 - name: Configure ghcr.io mirror through zot
  ansible.builtin.command:
@ -154,7 +185,7 @@
        skip_verify = true" | sudo tee /etc/containerd/certs.d/ghcr.io/hosts.toml'
  changed_when: true
  when:
-    - minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+    - minikube_final_status.rc == 0
    - "'host.minikube.internal:5050' not in minikube_ghcr_config.stdout"
  notify: Restart containerd in minikube

@ -164,7 +195,7 @@
    cmd: minikube ssh --native-ssh=false "cat /etc/containerd/certs.d/quay.io/hosts.toml 2>/dev/null || echo ''"
  register: minikube_quay_config
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

 - name: Configure quay.io mirror through zot
  ansible.builtin.command:
@ -176,7 +207,7 @@
        skip_verify = true" | sudo tee /etc/containerd/certs.d/quay.io/hosts.toml'
  changed_when: true
  when:
-    - minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+    - minikube_final_status.rc == 0
    - "'host.minikube.internal:5050' not in minikube_quay_config.stdout"
  notify: Restart containerd in minikube

@ -188,13 +219,13 @@
    cmd: kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
  register: minikube_api_url
  changed_when: false
-  when: minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+  when: minikube_final_status.rc == 0

 - name: Extract API server port from URL
  ansible.builtin.set_fact:
    minikube_api_port: "{{ minikube_api_url.stdout | regex_search(':([0-9]+)$', '\\1') | first }}"
  when:
-    - minikube_final_status.rc == 0 and 'Running' in minikube_final_status.stdout
+    - minikube_final_status.rc == 0
    - minikube_api_url.stdout is defined

 - name: Check current tailscale serve config for k8s
--- a/ansible/roles/minikube_metrics/defaults/main.yml
+++ b/ansible/roles/minikube_metrics/defaults/main.yml
@ -1,6 +1,6 @@
 ---
 minikube_metrics_dir: /opt/homebrew/var/node_exporter/textfile
-minikube_metrics_script: /Users/erichblume/bin/minikube-metrics
+minikube_metrics_script: /Users/erichblume/.local/bin/minikube-metrics
 minikube_metrics_interval: 60  # seconds between metric collection
 minikube_metrics_log_dir: /opt/homebrew/var/log
 minikube_metrics_user_home: /Users/erichblume
--- a/ansible/roles/plex_metrics/defaults/main.yml
+++ b/ansible/roles/plex_metrics/defaults/main.yml
@ -1,20 +0,0 @@
---
-# Plex metrics collection configuration
-
-# Plex server URL
-plex_metrics_url: "http://localhost:32400"
-
-# Path to file containing Plex token (should have 600 permissions)
-plex_metrics_token_file: "/Users/erichblume/.plex-token"
-
-# Metrics collection interval in seconds
-plex_metrics_interval: 60
-
-# Output directory for prometheus textfile collector
-plex_metrics_dir: /opt/homebrew/var/node_exporter/textfile
-
-# Script installation path
-plex_metrics_script: /Users/erichblume/bin/plex-metrics
-
-# Log directory for metrics script output
-plex_metrics_log_dir: /opt/homebrew/var/log
--- a/ansible/roles/plex_metrics/handlers/main.yml
+++ b/ansible/roles/plex_metrics/handlers/main.yml
@ -1,6 +0,0 @@
---
- name: Reload plex-metrics
-  ansible.builtin.shell: |
-    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.plex-metrics.plist 2>/dev/null || true
-    launchctl load ~/Library/LaunchAgents/mcquack.eblume.plex-metrics.plist
-  changed_when: true
--- a/ansible/roles/plex_metrics/meta/main.yml
+++ b/ansible/roles/plex_metrics/meta/main.yml
@ -1,4 +0,0 @@
---
-# Role ordering is controlled by indri.yml playbook - do not add dependencies here
-# (Ansible's tag accumulation prevents proper deduplication when using meta dependencies)
-dependencies: []
--- a/ansible/roles/plex_metrics/tasks/main.yml
+++ b/ansible/roles/plex_metrics/tasks/main.yml
@ -1,32 +0,0 @@
---
- name: Ensure bin directory exists
-  ansible.builtin.file:
-    path: "{{ plex_metrics_script | dirname }}"
-    state: directory
-    mode: '0755'
-
- name: Deploy plex metrics collection script
-  ansible.builtin.template:
-    src: plex-metrics.sh.j2
-    dest: "{{ plex_metrics_script }}"
-    mode: '0755'
-  notify: Reload plex-metrics
-
- name: Deploy plex-metrics LaunchAgent plist
-  ansible.builtin.template:
-    src: plex-metrics.plist.j2
-    dest: ~/Library/LaunchAgents/mcquack.eblume.plex-metrics.plist
-    mode: '0644'
-  notify: Reload plex-metrics
-
- name: Check if plex-metrics LaunchAgent is loaded
-  ansible.builtin.command: launchctl list mcquack.eblume.plex-metrics
-  register: plex_metrics_launchctl_check
-  changed_when: false
-  failed_when: false
-
- name: Load plex-metrics LaunchAgent if not loaded
-  ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.plex-metrics.plist
-  when: plex_metrics_launchctl_check.rc != 0
-  changed_when: true
-  failed_when: false
--- a/ansible/roles/plex_metrics/templates/plex-metrics.sh.j2
+++ b/ansible/roles/plex_metrics/templates/plex-metrics.sh.j2
@ -1,133 +0,0 @@
-#!/bin/bash
-# {{ ansible_managed }}
-# Collects Plex Media Server metrics for node_exporter textfile collector
-
-set -euo pipefail
-
-PLEX_URL="{{ plex_metrics_url }}"
-TOKEN_FILE="{{ plex_metrics_token_file }}"
-OUTPUT_FILE="{{ plex_metrics_dir }}/plex.prom"
-TEMP_FILE="${OUTPUT_FILE}.tmp"
-
-# Read token from file
-get_token() {
-    if [ -f "$TOKEN_FILE" ]; then
-        cat "$TOKEN_FILE" | tr -d '\n'
-    else
-        echo ""
-    fi
-}
-
-# Make API request with optional token
-api_request() {
-    local endpoint="$1"
-    local use_token="${2:-true}"
-    local token
-    local url="${PLEX_URL}${endpoint}"
-
-    if [ "$use_token" = "true" ]; then
-        token=$(get_token)
-        if [ -n "$token" ]; then
-            curl -s -H "Accept: application/json" -H "X-Plex-Token: $token" "$url" 2>/dev/null
-        else
-            curl -s -H "Accept: application/json" "$url" 2>/dev/null
-        fi
-    else
-        curl -s -H "Accept: application/json" "$url" 2>/dev/null
-    fi
-}
-
-# Initialize metrics
-plex_up=0
-plex_version=""
-plex_sessions_total=0
-plex_sessions_playing=0
-plex_sessions_paused=0
-plex_transcode_sessions_total=0
-plex_transcode_video=0
-plex_transcode_audio=0
-
-# Library metrics will be built dynamically
-library_metrics=""
-
-# Check server identity (no auth required)
-identity=$(api_request "/identity" false)
-if echo "$identity" | jq -e '.MediaContainer.machineIdentifier' > /dev/null 2>&1; then
-    plex_up=1
-    plex_version=$(echo "$identity" | jq -r '.MediaContainer.version // ""')
-fi
-
-# If server is up, get additional metrics (require auth)
-if [ "$plex_up" -eq 1 ] && [ -f "$TOKEN_FILE" ]; then
-    # Get library sections
-    sections=$(api_request "/library/sections")
-
-    # Process each library using jq
-    while IFS=$'\t' read -r lib_key lib_type lib_title; do
-        if [ -n "$lib_key" ] && [ -n "$lib_type" ]; then
-            # Get library details for item count
-            lib_detail=$(api_request "/library/sections/${lib_key}/all?X-Plex-Container-Start=0&X-Plex-Container-Size=0")
-            lib_size=$(echo "$lib_detail" | jq -r '.MediaContainer.totalSize // .MediaContainer.size // 0')
-
-            library_metrics="${library_metrics}plex_library_items{library=\"${lib_title}\",type=\"${lib_type}\"} ${lib_size}
-"
-        fi
-    done < <(echo "$sections" | jq -r '.MediaContainer.Directory[] | [.key, .type, .title] | @tsv' 2>/dev/null || true)
-
-    # Get active sessions
-    sessions=$(api_request "/status/sessions")
-    if echo "$sessions" | jq -e '.MediaContainer' > /dev/null 2>&1; then
-        plex_sessions_total=$(echo "$sessions" | jq -r '.MediaContainer.size // 0')
-
-        # Count playing vs paused
-        plex_sessions_playing=$(echo "$sessions" | jq -r '[.MediaContainer.Metadata[]? | select(.Player.state == "playing")] | length')
-        plex_sessions_paused=$(echo "$sessions" | jq -r '[.MediaContainer.Metadata[]? | select(.Player.state == "paused")] | length')
-
-        # Count transcode sessions
-        plex_transcode_video=$(echo "$sessions" | jq -r '[.MediaContainer.Metadata[]? | select(.TranscodeSession.videoDecision == "transcode")] | length')
-        plex_transcode_audio=$(echo "$sessions" | jq -r '[.MediaContainer.Metadata[]? | select(.TranscodeSession.audioDecision == "transcode")] | length')
-        plex_transcode_sessions_total=$(echo "$sessions" | jq -r '[.MediaContainer.Metadata[]? | select(.TranscodeSession)] | length')
-    fi
-fi
-
-# Write metrics
-cat > "$TEMP_FILE" << EOF
-# HELP plex_up Plex Media Server is up and responding
-# TYPE plex_up gauge
-plex_up ${plex_up}
-
-# HELP plex_version_info Plex Media Server version information
-# TYPE plex_version_info gauge
-plex_version_info{version="${plex_version}"} 1
-
-# HELP plex_sessions_total Total number of active Plex sessions
-# TYPE plex_sessions_total gauge
-plex_sessions_total ${plex_sessions_total}
-
-# HELP plex_sessions_playing Number of sessions currently playing
-# TYPE plex_sessions_playing gauge
-plex_sessions_playing ${plex_sessions_playing}
-
-# HELP plex_sessions_paused Number of sessions currently paused
-# TYPE plex_sessions_paused gauge
-plex_sessions_paused ${plex_sessions_paused}
-
-# HELP plex_transcode_sessions_total Number of sessions being transcoded
-# TYPE plex_transcode_sessions_total gauge
-plex_transcode_sessions_total ${plex_transcode_sessions_total}
-
-# HELP plex_transcode_video_sessions Number of sessions transcoding video
-# TYPE plex_transcode_video_sessions gauge
-plex_transcode_video_sessions ${plex_transcode_video}
-
-# HELP plex_transcode_audio_sessions Number of sessions transcoding audio
-# TYPE plex_transcode_audio_sessions gauge
-plex_transcode_audio_sessions ${plex_transcode_audio}
-
-# HELP plex_library_items Number of items in each Plex library
-# TYPE plex_library_items gauge
-${library_metrics}
-EOF
-
-# Atomic move
-mv "$TEMP_FILE" "$OUTPUT_FILE"
--- a/ansible/roles/sifaka_exporters/defaults/main.yml
+++ b/ansible/roles/sifaka_exporters/defaults/main.yml
@ -0,0 +1,15 @@
+---
+# Docker images for Prometheus exporters on sifaka NAS
+# Ports are defined in group_vars/all.yml (shared with caddy role)
+sifaka_exporters_docker: /volume1/@appstore/ContainerManager/usr/bin/docker
+sifaka_exporters_node_exporter_image: "prom/node-exporter:latest"
+sifaka_exporters_node_exporter_name: "prom-node-exporter-1"
+sifaka_exporters_smartctl_exporter_image: "prometheuscommunity/smartctl-exporter:latest"
+sifaka_exporters_smartctl_exporter_name: "smartctl-exporter"
+
+# Synology uses /dev/sata* instead of /dev/sd* — smartctl can't auto-detect them
+sifaka_exporters_smartctl_devices:
+  - /dev/sata1
+  - /dev/sata2
+  - /dev/sata3
+  - /dev/sata4
--- a/ansible/roles/sifaka_exporters/handlers/main.yml
+++ b/ansible/roles/sifaka_exporters/handlers/main.yml
@ -0,0 +1,12 @@
+---
+- name: Restart node_exporter
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} restart {{ sifaka_exporters_node_exporter_name }}"
+  become: true
+  listen: Restart node_exporter
+  changed_when: true
+
+- name: Restart smartctl_exporter
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} restart {{ sifaka_exporters_smartctl_exporter_name }}"
+  become: true
+  listen: Restart smartctl_exporter
+  changed_when: true
--- a/ansible/roles/sifaka_exporters/tasks/main.yml
+++ b/ansible/roles/sifaka_exporters/tasks/main.yml
@ -0,0 +1,91 @@
+---
+# Manage Prometheus exporter containers on sifaka NAS
+# Uses command module to avoid requiring docker Python SDK on Synology
+# Requires passwordless sudo for docker — see docs/reference/storage/sifaka.md
+
+# --- node_exporter ---
+
+- name: Pull node_exporter image
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} pull {{ sifaka_exporters_node_exporter_image }}"
+  become: true
+  register: sifaka_exporters_node_pull
+  changed_when: "'Downloaded newer image' in sifaka_exporters_node_pull.stdout"
+
+- name: Check if node_exporter container exists
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} inspect {{ sifaka_exporters_node_exporter_name }} --format {% raw %}'{{.Config.Image}}'{% endraw %}"
+  become: true
+  register: sifaka_exporters_node_inspect
+  changed_when: false
+  failed_when: false
+
+- name: Remove node_exporter container if image changed
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} rm -f {{ sifaka_exporters_node_exporter_name }}"
+  become: true
+  when:
+    - sifaka_exporters_node_inspect.rc == 0
+    - sifaka_exporters_node_inspect.stdout != sifaka_exporters_node_exporter_image
+  changed_when: true
+
+- name: Start node_exporter container
+  ansible.builtin.command:
+    argv:
+      - "{{ sifaka_exporters_docker }}"
+      - run
+      - -d
+      - "--name={{ sifaka_exporters_node_exporter_name }}"
+      - --restart=always
+      - --net=host
+      - "{{ sifaka_exporters_node_exporter_image }}"
+  become: true
+  register: sifaka_exporters_node_start
+  when: >
+    sifaka_exporters_node_inspect.rc != 0 or
+    sifaka_exporters_node_inspect.stdout != sifaka_exporters_node_exporter_image
+  changed_when: sifaka_exporters_node_start.rc == 0
+
+# --- smartctl_exporter ---
+
+- name: Pull smartctl_exporter image
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} pull {{ sifaka_exporters_smartctl_exporter_image }}"
+  become: true
+  register: sifaka_exporters_smartctl_pull
+  changed_when: "'Downloaded newer image' in sifaka_exporters_smartctl_pull.stdout"
+
+- name: Check if smartctl_exporter container exists
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} inspect {{ sifaka_exporters_smartctl_exporter_name }} --format {% raw %}'{{.Config.Image}}'{% endraw %}"
+  become: true
+  register: sifaka_exporters_smartctl_inspect
+  changed_when: false
+  failed_when: false
+
+- name: Remove smartctl_exporter container if image changed
+  ansible.builtin.command: "{{ sifaka_exporters_docker }} rm -f {{ sifaka_exporters_smartctl_exporter_name }}"
+  become: true
+  when:
+    - sifaka_exporters_smartctl_inspect.rc == 0
+    - sifaka_exporters_smartctl_inspect.stdout != sifaka_exporters_smartctl_exporter_image
+  changed_when: true
+
+- name: Build smartctl_exporter device arguments
+  ansible.builtin.set_fact:
+    sifaka_exporters_smartctl_device_args: >-
+      {{ sifaka_exporters_smartctl_devices | map('regex_replace', '^(.*)$', '--smartctl.device=\1') | list }}
+
+- name: Start smartctl_exporter container
+  ansible.builtin.command:
+    argv: >-
+      {{ [
+        sifaka_exporters_docker, 'run', '-d',
+        '--name=' + sifaka_exporters_smartctl_exporter_name,
+        '--restart=always',
+        '--privileged',
+        '--user=root',
+        '-p', sifaka_smartctl_exporter_port | string + ':' + sifaka_smartctl_exporter_port | string,
+        sifaka_exporters_smartctl_exporter_image
+      ] + sifaka_exporters_smartctl_device_args }}
+  become: true
+  register: sifaka_exporters_smartctl_start
+  when: >
+    sifaka_exporters_smartctl_inspect.rc != 0 or
+    sifaka_exporters_smartctl_inspect.stdout != sifaka_exporters_smartctl_exporter_image
+  changed_when: sifaka_exporters_smartctl_start.rc == 0
--- a/ansible/roles/tailscale_serve/defaults/main.yml
+++ b/ansible/roles/tailscale_serve/defaults/main.yml
@ -1,17 +0,0 @@
---
-# Tailscale serve configuration for this host
-# Each service maps a Tailscale service name to local endpoints
-
-tailscale_serve_services:
-  - name: svc:forge
-    https:
-      port: 443
-      upstream: http://localhost:3001
-    tcp:
-      port: 22
-      upstream: tcp://localhost:2200
-
-  - name: svc:registry
-    https:
-      port: 443
-      upstream: http://localhost:5050
--- a/ansible/roles/tailscale_serve/meta/main.yml
+++ b/ansible/roles/tailscale_serve/meta/main.yml
@ -1,4 +0,0 @@
---
-# Role ordering is controlled by indri.yml playbook - do not add dependencies here
-# (Ansible's tag accumulation prevents proper deduplication when using meta dependencies)
-dependencies: []
--- a/ansible/roles/tailscale_serve/tasks/main.yml
+++ b/ansible/roles/tailscale_serve/tasks/main.yml
@ -1,38 +0,0 @@
---
- name: Get current tailscale serve status
-  ansible.builtin.command: tailscale serve status --json
-  register: tailscale_serve_status
-  changed_when: false
-
- name: Parse serve status
-  ansible.builtin.set_fact:
-    tailscale_serve_config: "{{ ((tailscale_serve_status.stdout | default('{}', true)) | from_json).Services | default({}) }}"
-
-# Configure HTTPS if service doesn't have Web config yet
- name: Configure HTTPS services
-  ansible.builtin.command: >
-    tailscale serve --service="{{ item.name }}"
-    --https={{ item.https.port }} {{ item.https.upstream }}
-  loop: "{{ tailscale_serve_services }}"
-  when:
-    - item.https is defined
-    - tailscale_serve_config[item.name] is not defined or tailscale_serve_config[item.name].Web is not defined
-  register: tailscale_serve_https_result
-  changed_when: true
-  failed_when: false
-
-# Configure TCP if service doesn't have the specific port configured yet
- name: Configure TCP services
-  ansible.builtin.command: >
-    tailscale serve --service="{{ item.name }}"
-    --tcp={{ item.tcp.port }} {{ item.tcp.upstream }}
-  loop: "{{ tailscale_serve_services }}"
-  when:
-    - item.tcp is defined
-    - tailscale_serve_config[item.name] is not defined or
-      tailscale_serve_config[item.name].TCP is not defined or
-      tailscale_serve_config[item.name].TCP[item.tcp.port | string] is not defined or
-      tailscale_serve_config[item.name].TCP[item.tcp.port | string].TCPForward is not defined
-  register: tailscale_serve_tcp_result
-  changed_when: true
-  failed_when: false
--- a/ansible/roles/zot/defaults/main.yml
+++ b/ansible/roles/zot/defaults/main.yml
@ -5,6 +5,8 @@ zot_data_dir: /Users/erichblume/zot
 zot_config_dir: /Users/erichblume/.config/zot
 zot_port: 5050
 zot_log_dir: /Users/erichblume/Library/Logs
+zot_external_url: https://registry.ops.eblu.me
+zot_oidc_issuer: https://authentik.ops.eblu.me/application/o/zot/

 # Pull-through cache registries (on-demand sync)
 zot_sync_registries:
--- a/ansible/roles/zot/tasks/main.yml
+++ b/ansible/roles/zot/tasks/main.yml
@ -46,6 +46,14 @@
    mode: '0644'
  notify: Restart zot

+- name: Deploy zot OIDC credentials
+  ansible.builtin.template:
+    src: oidc-credentials.json.j2
+    dest: "{{ zot_config_dir }}/oidc-credentials.json"
+    mode: '0600'
+  notify: Restart zot
+  when: zot_oidc_client_secret is defined
+
 - name: Deploy zot LaunchAgent plist
  ansible.builtin.template:
    src: zot.plist.j2
--- a/ansible/roles/zot/templates/config.json.j2
+++ b/ansible/roles/zot/templates/config.json.j2
@ -8,7 +8,44 @@
  },
  "http": {
    "address": "0.0.0.0",
-    "port": "{{ zot_port }}"
+    "port": "{{ zot_port }}",
+    "externalUrl": "{{ zot_external_url }}",
+    "auth": {
+      "openid": {
+        "providers": {
+          "oidc": {
+            "credentialsFile": "{{ zot_config_dir }}/oidc-credentials.json",
+            "issuer": "{{ zot_oidc_issuer }}",
+            "scopes": ["openid", "email", "profile"],
+            "claimMapping": {
+              "username": "preferred_username"
+            }
+          }
+        }
+      },
+      "apikey": true
+    },
+    "accessControl": {
+      "metrics": {
+        "users": [""]
+      },
+      "repositories": {
+        "**": {
+          "policies": [
+            {
+              "groups": ["artifact-workloads"],
+              "actions": ["read", "create"]
+            },
+            {
+              "groups": ["admins"],
+              "actions": ["read", "create", "update", "delete"]
+            }
+          ],
+          "anonymousPolicy": ["read"],
+          "defaultPolicy": ["read"]
+        }
+      }
+    }
  },
  "log": {
    "level": "info"
--- a/ansible/roles/zot/templates/oidc-credentials.json.j2
+++ b/ansible/roles/zot/templates/oidc-credentials.json.j2
@ -0,0 +1,4 @@
+{
+  "clientid": "zot",
+  "clientsecret": "{{ zot_oidc_client_secret }}"
+}
--- a/ansible/roles/zot/templates/zot.plist.j2
+++ b/ansible/roles/zot/templates/zot.plist.j2
@ -16,6 +16,11 @@
 	<true/>
 	<key>KeepAlive</key>
 	<true/>
+	<key>EnvironmentVariables</key>
+	<dict>
+		<key>PATH</key>
+		<string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+	</dict>
 	<key>StandardOutPath</key>
 	<string>{{ zot_log_dir }}/mcquack.zot.out.log</string>
 	<key>StandardErrorPath</key>
--- a/Show more
+++ b/Show more