Compare commits

...
Sign in to create a new pull request.

11 commits

Author SHA1 Message Date
bc34b601be Merge pull request 'heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)' (#371) from heph-offline-access into main 2026-06-06 18:29:47 -07:00
50a36ff93a heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)
The heph CLI requests scope "openid offline_access", but the Authentik
heph OAuth2 provider only mapped openid/email/profile. Without the
offline_access mapping the issued refresh token is bound to the login
session rather than the 30-day refresh-token window; once the session
lapses, hephd's refresh_token grant returns 400 Bad Request and spoke
sync silently degrades (heph sync --status -> auth_failure: true).

Add the built-in offline_access scope mapping to the provider's
property_mappings and document the requirement in the service reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:07:13 -07:00
cf63fcb5b5 C0: track heph in service-versions (self-updating; note drift task)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:22:46 -07:00
3abe80523a C0: bump indri heph hub to v1.2.1 (PWA Authentik login + /config)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:40:51 -07:00
6576880b0e heph Authentik: register heph-pwa redirect URIs (PKCE login) (#370)
Adds the heph-pwa redirect URIs to the Authentik `heph` OAuth2 provider so the new browser **Login with Authentik** flow (Authorization Code + PKCE, hephaestus PR #9) can redirect back and exchange the code:

- `https://heph.ops.eblu.me/` (the PWA origin)
- `http://localhost:8787/` (local dev: `hephd --web-root`)

Authentik also keys token-endpoint CORS off these origins, so they're required for the browser token exchange. Additive (the provider was `redirect_uris: []`); harmless until the PWA feature deploys.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #370
2026-06-05 07:30:31 -07:00
a2f1e06224 Add hephaestus sync hub to indri (launchagent, PWA, device-code OIDC) (#369)
Makes indri the canonical **heph** hub for the hub-and-spoke task/context system, deployed as a self-updating LaunchAgent managed by Ansible. Other devices (gilbert) attach as offline-capable spokes.

## What's here
- **`ansible/roles/heph`** (tag `heph`) — bootstrap `cargo install hephd` (only if absent; `--self-update` keeps it current after), version-pinned `heph-pwa` checkout served via `--web-root`, launchagent `mcquack.eblume.heph`:
  ```
  hephd --mode server --http-addr 0.0.0.0:8787 --db … --web-root …
        --oidc-issuer …/o/heph/ --oidc-audience heph
        --self-update --self-update-interval-secs 600
  ```
  `~/.cargo/bin` is on the agent `PATH` so self-update's `cargo install` works.
- **Caddy** — `heph.ops.eblu.me → localhost:8787` (TLS for the PWA secure context).
- **Authentik** — new `heph` **public device-code** OIDC app + `default-device-code-flow` bound to the default brand's `flow_device_code` (verified live: brand `authentik-default`, field currently unset → additive).
- **Docs** — `services/hephaestus.md` (Path-A seeding runbook + spoke caveat), `indri.md`, changelog fragment.

## Three features requested
- **Autoupdate** — 10-min interval (`--self-update-interval-secs 600`).
- **PWA** — `--web-root` (confirmed shipped in v1.2.0).
- **Spoke** — gilbert reconfig documented (post-merge step).

## Deploy plan (not done yet — awaiting review)
1. Seed from gilbert (Path A): `heph daemon stop` → copy `heph.db` → `DELETE FROM meta WHERE key='origin'`.
2. Sync Authentik `apps`/blueprint; verify blueprint status via API (not just logs).
3. `provision-indri --tags heph,caddy` from this branch.
4. Point gilbert at the hub + `heph auth login`.

## Known follow-ups (heph-side, tracked in the Hephaestus project)
- `heph daemon` can't bake hub/spoke config or pass `--self-update-interval-secs` → worked around by the ansible plist.
- Path-A seeding lacks a clean `hephd --owner-id`/seed command → manual `meta.origin` reset for now.
- Self-update moves hephd ahead of the ansible-pinned PWA shell over time (drift; tolerated by the SW cache, revisit on next release).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #369
2026-06-05 06:46:58 -07:00
f6c926f1f5 C0: rebuild external-secrets off main, repoint both clusters to stable tags
indri -> v2.2.0-13895bb (arm64), ringtail -> v2.2.0-13895bb-nix (amd64).
Both deployed images now trace to main commit 13895bb instead of earlier
branch builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:19:20 -07:00
13895bb04a Localize external-secrets on ringtail (amd64 nix build) (#368)
Follow-up to #367. That PR localized external-secrets but the Dagger build (on indri's Apple Silicon runner) only produces an **arm64** image — and external-secrets also runs on **ringtail (amd64)** via the same shared manifest. This completes the localization so both clusters run the local binary on their native arch.

## Approach (matches the kube-state-metrics dual-build pattern)
- **`containers/external-secrets/default.nix`** (new) — builds the **amd64** image on ringtail's nix-container-builder. `buildGoModule` with Go 1.26 (v2.2.0 requires ≥1.26.1; nixpkgs default is 1.25.x) and `-tags all_providers`, faithful to upstream. Same v2.2.0 source from the forge mirror.
- **`argocd/manifests/external-secrets-ringtail/`** (new) — thin kustomize overlay that reuses the shared indri manifest as a base and overrides **only** the image to the `-nix` (amd64) tag. No manifest duplication.
- **`argocd/apps/external-secrets-ringtail.yaml`** — repointed at the new overlay.

Result: indri → `v2.2.0-…` (arm64, Dagger), ringtail → `v2.2.0-…-nix` (amd64, nix).

## Build
Run #581 built both arches at the branch commit. Verified the nix image is `linux/amd64`, entrypoint = the binary, user 65534.

## Deployed from branch & verified on ringtail (k3s, amd64)
- All 3 pods rolled to the nix amd64 image, `1/1 Running` (no exec-format error → arch correct)
- Controller logs clean
- **Live secret fetch proven:** force-synced `homepage/homepage-grafana` → `refreshTime` advanced, `Ready=True`
- **All 20** ringtail ExternalSecrets remain `SecretSynced=True`

## Post-merge
The `external-secrets-ringtail` app is temporarily pointed at this branch + overlay path (apps app left on `main`, manual-sync, untouched). After merge:
```
argocd app sync apps                       # picks up the new Application path on main
argocd app set external-secrets-ringtail --revision main && argocd app sync external-secrets-ringtail
```
I'll also rebuild off `main` so both clusters land on stable main-sha tags (as done for indri in #367).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #368
2026-06-04 15:37:42 -07:00
30c82079b9 C0: rebuild external-secrets image off main (v2.2.0-0e70a1b)
Repoint to the main-branch-built image so the deployed tag traces to a main
commit rather than the merged feature branch. Same v2.2.0 source, stable
provenance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:59:17 -07:00
0e70a1b524 Localize external-secrets container (native container.py build) (#367)
Knocks out the weekly "pick one non-local container and make it local" task by moving **external-secrets** off `ghcr.io` onto a locally-built image, under our own supply-chain control. Doubles as its overdue service review.

## What changed
- **`containers/external-secrets/container.py`** (new) — native Dagger build (the Dockerfile→container.py migration pattern). Clones the forge mirror at `v2.2.0` and builds the single `all_providers` static Go binary, faithful to upstream's `make build` (CGO off, no version ldflags upstream). ENTRYPOINT is `/bin/external-secrets` so the controller/webhook/cert-controller Deployments select their role via `args:` exactly as before.
- **`argocd/manifests/external-secrets/kustomization.yaml`** — image swapped to `registry.ops.eblu.me/blumeops/external-secrets:v2.2.0-2985007`. **Like-for-like (v2.2.0)**, not an upgrade.
- **`service-versions.yaml`** — marked reviewed (2026-06-04), noted the local build.

## Build
Built on the indri forge runner (run #579, ~4 min) → pushed to Zot. Image config verified: `Entrypoint=/bin/external-secrets`, `User=65534`, version label `v2.2.0`.

## Deployed from branch & verified
- All 3 pods (controller / webhook / cert-controller) rolled to the local image, `1/1 Running`
- Controller + webhook logs clean (no errors; webhook serving TLS)
- **End-to-end secret fetch proven:** force-synced `monitoring/grafana-admin` → `refreshTime` advanced to now, `Ready=True`
- All 10 ExternalSecrets cluster-wide remain `SecretSynced=True` — no collateral damage
- App `Healthy`

## Post-merge
`external-secrets` currently points at this branch (so `apps` reads OutOfSync — expected). After merge:
```
argocd app set external-secrets --revision main && argocd app sync external-secrets
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #367
2026-06-04 14:55:55 -07:00
bb55fa9566 Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR.

## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`)
- **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list).
- **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1.

## Service review
- **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only).

## Not in this PR (need container rebuilds, deferred)
The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR.

## Deploy / verify
Not yet deployed — `nvidia-device-plugin` still points at `main`. After review:
```
argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin
# after merge:
argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #366
2026-06-04 13:37:02 -07:00
30 changed files with 601 additions and 17 deletions

View file

@ -260,5 +260,7 @@
tags: cv tags: cv
- role: docs - role: docs
tags: docs tags: docs
- role: heph
tags: heph
- role: caddy - role: caddy
tags: caddy tags: caddy

View file

@ -52,6 +52,9 @@ caddy_services:
- name: devpi - name: devpi
host: "pypi.{{ caddy_domain }}" host: "pypi.{{ caddy_domain }}"
backend: "http://localhost:3141" backend: "http://localhost:3141"
- name: heph
host: "heph.{{ caddy_domain }}"
backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell
- name: kiwix - name: kiwix
host: "kiwix.{{ caddy_domain }}" host: "kiwix.{{ caddy_domain }}"
backend: "https://kiwix.tail8d86e.ts.net" backend: "https://kiwix.tail8d86e.ts.net"

View file

@ -0,0 +1,49 @@
---
# hephaestus hub — the canonical heph replica (server mode) on indri.
# Other devices (e.g. gilbert) are spokes that sync against this hub.
# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo.
# Pinned release used for the initial `cargo install` and the PWA shell.
# After bootstrap, hephd's own --self-update keeps the binary current; this
# pin only governs the first install and the bundled PWA shell version.
heph_version: v1.2.1
# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial
# install and unattended self-update build from the same source (no ssh-agent).
heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git
heph_bin_dir: /Users/erichblume/.cargo/bin
heph_binary: "{{ heph_bin_dir }}/hephd"
# rustc/cargo here are rustup shims. The bare (non-mise) environment that the
# launchagent and ansible run in falls back to rustup's *default* toolchain,
# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the
# channel explicitly so both the bootstrap build and unattended self-update
# always use a current toolchain regardless of the host's rustup default.
heph_rust_toolchain: stable
heph_data_dir: /Users/erichblume/.local/share/heph
heph_db: "{{ heph_data_dir }}/heph.db"
heph_socket: "{{ heph_data_dir }}/hephd.sock"
heph_log_dir: /Users/erichblume/Library/Logs
# Version-pinned source checkout; the PWA static shell is served directly from
# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version.
heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src
heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa"
# Hub listens on all interfaces so tailnet spokes can reach it directly
# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me.
# Access is gated by Authentik OIDC regardless — tailnet reachability is not
# enough (this is the owner's most sensitive data).
heph_http_addr: 0.0.0.0:8787
heph_port: 8787
heph_external_url: https://heph.ops.eblu.me
# Authentik OIDC — issuer + audience together turn hub auth on. The audience is
# the device-code client id (see argocd/manifests/authentik heph blueprint).
heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/
heph_oidc_audience: heph
# Self-update poll interval (seconds). 10 minutes.
heph_self_update_interval_secs: 600

View file

@ -0,0 +1,6 @@
---
- name: Restart heph
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
changed_when: true

View file

@ -0,0 +1,82 @@
---
# hephaestus hub (server mode) on indri.
#
# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub
# adopts gilbert's existing data instead of being born empty):
#
# 1. On the seed device (gilbert): heph daemon stop
# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \
# indri:~/.local/share/heph/heph.db
# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id +
# data; hephd regenerates a fresh origin on next start when it is missing):
# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';"
# 4. Run this role (installs hephd, stages the PWA, loads the launchagent).
#
# hephd auto-creates an empty store on first start if none exists, so seeding is
# optional — skip it only if you intend a fresh, empty hub.
- name: Ensure heph data directory exists
ansible.builtin.file:
path: "{{ heph_data_dir }}"
state: directory
mode: '0700'
- name: Check for installed hephd binary
ansible.builtin.stat:
path: "{{ heph_binary }}"
register: heph_binary_stat
# Bootstrap install only when hephd is absent. Thereafter hephd's own
# --self-update keeps it current; ansible must not fight (or downgrade) it.
# This builds from source and can take several minutes on a cold cargo cache.
- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }})
ansible.builtin.command:
cmd: >-
{{ heph_bin_dir }}/cargo install --locked
--git {{ heph_repo_url }}
--tag {{ heph_version }}
heph hephd
environment:
PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}"
when: not heph_binary_stat.stat.exists
changed_when: true
notify: Restart heph
# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir),
# served directly by hephd. Static files are read from disk per request, so a
# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN")
# evicts stale assets on next load.
- name: Ensure heph cache parent directory exists
ansible.builtin.file:
path: "{{ heph_pwa_src_dir | dirname }}"
state: directory
mode: '0755'
- name: Stage heph-pwa source at {{ heph_version }}
ansible.builtin.git:
repo: "{{ heph_repo_url }}"
dest: "{{ heph_pwa_src_dir }}"
version: "{{ heph_version }}"
depth: 1
single_branch: true
force: true
- name: Deploy heph LaunchAgent plist
ansible.builtin.template:
src: heph.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist
mode: '0644'
notify: Restart heph
- name: Check if heph LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.heph
register: heph_launchctl_check
changed_when: false
failed_when: false
- name: Load heph LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
when: heph_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,50 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.heph</string>
<key>ProgramArguments</key>
<array>
<string>{{ heph_binary }}</string>
<string>--mode</string>
<string>server</string>
<string>--http-addr</string>
<string>{{ heph_http_addr }}</string>
<string>--db</string>
<string>{{ heph_db }}</string>
<string>--socket</string>
<string>{{ heph_socket }}</string>
<string>--web-root</string>
<string>{{ heph_web_root }}</string>
<string>--oidc-issuer</string>
<string>{{ heph_oidc_issuer }}</string>
<string>--oidc-audience</string>
<string>{{ heph_oidc_audience }}</string>
<string>--self-update</string>
<string>--self-update-interval-secs</string>
<string>{{ heph_self_update_interval_secs }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<!-- cargo + toolchain on PATH so --self-update can run `cargo install`. -->
<key>PATH</key>
<string>{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
<key>HOME</key>
<string>/Users/erichblume</string>
<!-- Pin the rustup channel: the launchagent runs without mise, so a bare
cargo shim would otherwise use rustup's (stale) default toolchain. -->
<key>RUSTUP_TOOLCHAIN</key>
<string>{{ heph_rust_toolchain }}</string>
</dict>
<key>StandardOutPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.out.log</string>
<key>StandardErrorPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.err.log</string>
</dict>
</plist>

View file

@ -15,7 +15,7 @@ spec:
source: source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main targetRevision: main
path: argocd/manifests/external-secrets path: argocd/manifests/external-secrets-ringtail
destination: destination:
server: https://ringtail.tail8d86e.ts.net:6443 server: https://ringtail.tail8d86e.ts.net:6443
namespace: external-secrets namespace: external-secrets

View file

@ -434,3 +434,93 @@ data:
provider: !KeyOf mealie-provider provider: !KeyOf mealie-provider
meta_launch_url: https://meals.ops.eblu.me meta_launch_url: https://meals.ops.eblu.me
policy_engine_mode: all policy_engine_mode: all
heph.yaml: |
version: 1
metadata:
name: BlumeOps Heph SSO
labels:
blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow"
entries:
# Device-code flow (RFC 8628). authentik ships no default for this, so we
# create one and bind it to the brand below. An empty stage_configuration
# flow is sufficient: the already-authenticated user just confirms the code.
- model: authentik_flows.flow
id: device-code-flow
identifiers:
slug: default-device-code-flow
attrs:
name: Device code flow
title: Device code flow
slug: default-device-code-flow
designation: stage_configuration
authentication: require_authenticated
# Enable the device-code grant globally by binding the flow to the default
# brand (domain authentik-default). Partial update — only sets this field.
- model: authentik_brands.brand
identifiers:
domain: authentik-default
attrs:
flow_device_code: !KeyOf device-code-flow
# OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret).
# client_id doubles as the token audience the hub verifies (--oidc-audience heph),
# and the app slug 'heph' is the issuer path (/application/o/heph/).
- model: authentik_providers_oauth2.oauth2provider
id: heph-provider
identifiers:
name: Heph
attrs:
name: Heph
authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]]
invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]]
client_type: public
client_id: heph
# CLI/TUI use the device-code grant (no redirect). The heph-pwa browser
# login uses Authorization Code + PKCE, which DOES redirect back to the
# app's origin — register those here (Authentik also keys token-endpoint
# CORS off these origins). Trailing slash matters: the PWA's redirect_uri
# is its base dir, e.g. https://heph.ops.eblu.me/.
redirect_uris:
- matching_mode: strict
url: https://heph.ops.eblu.me/
- matching_mode: strict
url: http://localhost:8787/ # local dev (hephd --web-root)
signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]]
property_mappings:
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]]
# offline_access: heph CLI requests "openid offline_access"; without
# this mapping the refresh token is session-bound and hephd's
# refresh_token grant 400s once the session lapses (spoke sync dies).
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]]
sub_mode: hashed_user_id
include_claims_in_id_token: true
# Heph application — linked to the OAuth2 provider
- model: authentik_core.application
id: heph-app
identifiers:
slug: heph
attrs:
name: Hephaestus
slug: heph
provider: !KeyOf heph-provider
meta_launch_url: https://heph.ops.eblu.me
policy_engine_mode: any
# Policy binding — restrict heph to admins group (single-owner, sensitive data)
- model: authentik_policies.policybinding
identifiers:
order: 0
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
attrs:
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
order: 0
enabled: true
negate: false
timeout: 30

View file

@ -0,0 +1,16 @@
# Ringtail (amd64) overlay for external-secrets.
#
# Reuses the shared indri manifest as a base and only overrides the controller
# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64
# image (built via containers/external-secrets/container.py on indri's Dagger
# runner); ringtail's k3s is amd64 and needs the image built by
# containers/external-secrets/default.nix on the nix-container-builder.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../external-secrets
images:
- name: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb-nix

View file

@ -12,4 +12,5 @@ resources:
images: images:
- name: ghcr.io/external-secrets/external-secrets - name: ghcr.io/external-secrets/external-secrets
newTag: v2.2.0 newName: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb

View file

@ -10,4 +10,4 @@ resources:
images: images:
- name: nvcr.io/nvidia/k8s-device-plugin - name: nvcr.io/nvidia/k8s-device-plugin
newTag: v0.19.0 newTag: v0.19.2

View file

@ -0,0 +1,51 @@
"""External Secrets Operator — native Dagger build.
Two-stage build: Go binary (all providers), Alpine runtime.
Source cloned from forge mirror.
A single binary serves as the controller, webhook, and cert-controller; the
Deployments select the role via a subcommand passed in `args:`, so the image
ENTRYPOINT must be the binary itself (matching upstream's distroless image).
"""
import dagger
from blumeops.containers import (
alpine_runtime,
clone_from_forge,
go_build,
oci_labels,
)
VERSION = "v2.2.0"
async def build(src: dagger.Directory) -> dagger.Container:
source = clone_from_forge("external-secrets", VERSION)
# Upstream `make build` compiles every secret provider into a single
# static binary (`-tags all_providers`, CGO disabled). Mirror that so the
# local image is functionally identical to ghcr.io/.../external-secrets.
backend = go_build(
source,
"/external-secrets",
tags="all_providers",
)
runtime = alpine_runtime(
extra_apk=["ca-certificates"],
create_user=False,
)
runtime = oci_labels(
runtime,
title="External Secrets Operator",
description=(
"Kubernetes operator that integrates external secret management systems"
),
version=VERSION,
)
return (
runtime.with_file("/bin/external-secrets", backend.file("/external-secrets"))
.with_user("65534")
.with_entrypoint(["/bin/external-secrets"])
)

View file

@ -0,0 +1,56 @@
# Nix-built External Secrets Operator (amd64, for ringtail k3s).
# Builds v2.2.0 from the forge mirror with all secret providers compiled in,
# faithful to upstream's `make build` (-tags all_providers). The container.py
# sibling builds the arm64 image for indri's minikube; this default.nix builds
# the amd64 image on ringtail's nix-container-builder.
{ pkgs ? import <nixpkgs> { } }:
let
version = "2.2.0";
src = pkgs.fetchgit {
url = "https://forge.ops.eblu.me/mirrors/external-secrets.git";
rev = "v${version}";
hash = "sha256-eAocOAp5s4CFRrpKfQr2lf3Ji+6nQQ1A5/eTw5B7v9U=";
};
# external-secrets v2.2.0 requires Go >= 1.26.1; nixpkgs default go is 1.25.x.
external-secrets = (pkgs.buildGoModule.override { go = pkgs.go_1_26; }) {
inherit src version;
pname = "external-secrets";
vendorHash = "sha256-0xuBK3fjAplPLAElHvKB6d+2lDz+De/s91fV4dPZwjE=";
doCheck = false;
subPackages = [ "." ];
tags = [ "all_providers" ];
ldflags = [ "-s" "-w" ];
meta = with pkgs.lib; {
description = "Kubernetes operator that integrates external secret management systems";
homepage = "https://github.com/external-secrets/external-secrets";
license = licenses.asl20;
mainProgram = "external-secrets";
};
};
in
pkgs.dockerTools.buildLayeredImage {
name = "blumeops/external-secrets";
contents = [
external-secrets
pkgs.cacert
pkgs.tzdata
];
config = {
Entrypoint = [ "${external-secrets}/bin/external-secrets" ];
Env = [
"SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
"TZDIR=${pkgs.tzdata}/share/zoneinfo"
];
User = "65534";
};
}

View file

@ -0,0 +1 @@
Rebuilt the locally-built external-secrets image from the `main` branch so the deployed tag (`v2.2.0-0e70a1b`) traces to a `main` commit rather than the now-merged feature branch, giving a stable provenance reference.

View file

@ -0,0 +1 @@
Rebuilt the external-secrets images off `main` and repointed both clusters to the stable main-sha tags (`v2.2.0-13895bb` arm64 / `v2.2.0-13895bb-nix` amd64), so the deployed images on indri and ringtail trace to the same `main` commit rather than earlier feature-branch builds.

View file

@ -0,0 +1 @@
Bumped the indri heph hub to v1.2.1, which adds the hub `GET /config` endpoint and ships the heph-pwa **Login with Authentik** flow (Authorization Code + PKCE). Pairs with the Authentik `heph` provider redirect URIs registered earlier.

View file

@ -0,0 +1 @@
Completed the external-secrets localization for the ringtail (amd64) cluster. The indri Dagger build (`container.py`) only produces an arm64 image; added `containers/external-secrets/default.nix` to build the amd64 variant on ringtail's nix-container-builder, and gave `external-secrets-ringtail` a thin kustomize overlay that reuses the shared manifest and points at the `-nix` image. Both clusters now run the locally-built external-secrets binary on their native architecture.

View file

@ -0,0 +1 @@
Added the [[hephaestus]] (`heph`) sync hub to indri as a self-updating LaunchAgent managed by Ansible (`ansible/roles/heph`, tag `heph`). The hub runs `hephd --mode server` behind `heph.ops.eblu.me` (Caddy TLS), with self-update on a 10-minute interval and the heph-pwa mobile shell served from `--web-root`. Access is gated by a new Authentik device-code (RFC 8628) OIDC application. Indri is now the canonical hub; other devices (e.g. gilbert) attach as offline-capable spokes. The hub's store was seeded from gilbert via the data-safe Path A bring-up (copy store, reset `meta.origin`).

View file

@ -0,0 +1 @@
Granted the `offline_access` scope on the Authentik `heph` OAuth2 provider so hephaestus spokes receive a durable 30-day refresh token. Previously the refresh token was session-bound, so spoke sync would silently fail with a `400 Bad Request` on the `refresh_token` grant once the Authentik session lapsed.

View file

@ -0,0 +1 @@
Registered the heph-pwa redirect URIs (`https://heph.ops.eblu.me/`, plus `http://localhost:8787/` for dev) on the Authentik `heph` OAuth2 provider, enabling the PWA's new Authorization Code + PKCE "Login with Authentik" flow (and the token-endpoint CORS it needs). Pairs with hephaestus PR #9.

View file

@ -0,0 +1 @@
Localized the external-secrets controller image. It now builds from the forge mirror via a native Dagger `container.py` (single `all_providers` static Go binary, faithful to upstream's `make build`) and is served from `registry.ops.eblu.me/blumeops/external-secrets` instead of `ghcr.io`, bringing another platform component under local supply-chain control.

View file

@ -0,0 +1 @@
Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration.

View file

@ -0,0 +1 @@
Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup).

View file

@ -33,6 +33,7 @@ Primary BlumeOps server. Mac Mini M1 (2020).
- [[alloy|Alloy]] - Metrics/logs collector - [[alloy|Alloy]] - Metrics/logs collector
- [[caddy]] - Reverse proxy for `*.ops.eblu.me` - [[caddy]] - Reverse proxy for `*.ops.eblu.me`
- [[devpi]] - PyPI mirror (LaunchAgent) - [[devpi]] - PyPI mirror (LaunchAgent)
- [[hephaestus]] - heph task/context sync hub (LaunchAgent, self-updating)
- [[cv]] - Static CV site, served by Caddy - [[cv]] - Static CV site, served by Caddy
- [[docs]] - Quartz-built docs site, served by Caddy - [[docs]] - Quartz-built docs site, served by Caddy

View file

@ -1,6 +1,7 @@
--- ---
title: Cluster title: Cluster
modified: 2026-02-19 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- kubernetes - kubernetes
--- ---
@ -15,7 +16,7 @@ BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most ser
|----------|-------| |----------|-------|
| **Driver** | docker | | **Driver** | docker |
| **Container Runtime** | docker | | **Container Runtime** | docker |
| **Kubernetes Version** | v1.34.0 | | **Kubernetes Version** | v1.35.0 |
| **CPUs** | 6 | | **CPUs** | 6 |
| **Memory** | 11GB | | **Memory** | 11GB |
| **Disk** | 200GB | | **Disk** | 200GB |
@ -41,7 +42,9 @@ Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringt
|----------|-------| |----------|-------|
| **Context** | `k3s-ringtail` | | **Context** | `k3s-ringtail` |
| **API Server** | `https://ringtail.tail8d86e.ts.net:6443` | | **API Server** | `https://ringtail.tail8d86e.ts.net:6443` |
| **Workloads** | Frigate (GPU), ntfy, frigate-notify, nvidia-device-plugin | | **Workloads** | GPU workloads (Frigate, Ollama), notifications (ntfy, frigate-notify), [[authentik]], and services migrated off indri minikube (Immich, Mealie, Paperless, TeslaMate). See [[ringtail]] for the authoritative list. |
Services are being progressively migrated from indri's minikube to ringtail's k3s; the split above reflects an in-progress state, not a fixed boundary.
## Related ## Related

View file

@ -1,6 +1,7 @@
--- ---
title: Alloy title: Alloy
modified: 2026-03-13 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- observability - observability
@ -20,10 +21,10 @@ Unified observability collector for metrics and logs with three deployments:
| **Indri Binary** | `~/.local/bin/alloy` | | **Indri Binary** | `~/.local/bin/alloy` |
| **Indri Config** | `~/.config/grafana-alloy/config.alloy` | | **Indri Config** | `~/.config/grafana-alloy/config.alloy` |
| **K8s Namespace** | `alloy` | | **K8s Namespace** | `alloy` |
| **K8s Image** | `grafana/alloy:v1.14.0` | | **K8s Image** | `registry.ops.eblu.me/blumeops/alloy:v1.16.0-9564435` (locally built) |
| **ArgoCD App** | `alloy-k8s` | | **ArgoCD App** | `alloy-k8s` |
| **Fly.io Config** | `fly/alloy.river` | | **Fly.io Config** | `fly/alloy.river` |
| **Fly.io Image** | `grafana/alloy:v1.5.1` (binary copied into nginx container) | | **Fly.io Image** | `grafana/alloy:v1.16.1` (binary copied into nginx container, sha-pinned) |
## Metrics Collected ## Metrics Collected

View file

@ -0,0 +1,141 @@
---
title: Hephaestus
modified: 2026-06-04
last-reviewed: 2026-06-04
tags:
- service
- hephaestus
---
# Hephaestus
[hephaestus](https://github.com/eblume/hephaestus) (`heph`) is the user's
self-hosted task + context/knowledge system. It is **hub-and-spoke**: each device
runs a full local SQLite replica (`hephd --mode local`) and background-syncs
against one canonical **hub**. Indri runs that hub.
## Quick Reference
| Property | Value |
|----------|-------|
| **PWA URL** | https://heph.ops.eblu.me (browser PWA, Caddy TLS) |
| **Spoke sync URL** | http://indri.tail8d86e.ts.net:8787 (direct, tailnet) |
| **Local Port** | 8787 (`hephd --mode server`, bound `0.0.0.0`) |
| **Binary** | `~/.cargo/bin/hephd` (self-updating) |
| **Data** | `~/.local/share/heph/heph.db` |
| **PWA shell** | `~/.local/share/heph/web` |
| **Logs** | `~/Library/Logs/mcquack.heph.{out,err}.log` |
| **LaunchAgent** | `mcquack.eblume.heph` |
| **Ansible role** | `ansible/roles/heph` (tag `heph`) |
## What runs on indri
The launchagent runs the hub in server mode with three features enabled:
```
hephd --mode server --http-addr 0.0.0.0:8787 --db ~/.local/share/heph/heph.db
--web-root ~/.local/share/heph/web
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/
--oidc-audience heph
--self-update --self-update-interval-secs 600
```
- **Server mode** exposes the HTTP sync endpoint (`/rpc`, `/sync/*`) that spokes
reconcile their op-log against.
- **Self-update** (10-minute poll) rebuilds `hephd` from the forge when a newer
release tag appears (`cargo install --git https://forge.eblu.me/eblume/hephaestus.git`).
Indri's Rust toolchain (`~/.cargo/bin`) is on the agent's `PATH` for this, and
the plist pins `RUSTUP_TOOLCHAIN=stable` — the
launchagent runs without mise, so a bare `cargo` shim would otherwise fall back
to rustup's *default* toolchain, which can lag behind heph's `rust-version` floor
(1.89) and silently fail the build.
- **PWA** (`--web-root`) serves the [heph-pwa] mobile shell; Caddy terminates TLS
at `heph.ops.eblu.me` so the PWA runs in a secure context (service worker,
install-to-home-screen, voice capture).
[heph-pwa]: https://github.com/eblume/hephaestus
The hub binds `0.0.0.0` so tailnet spokes can also sync directly
(`http://indri.tail8d86e.ts.net:8787`); access is gated by Authentik OIDC either
way — tailnet reachability alone is not enough.
## Authentication (Authentik OIDC, device-code)
The hub verifies an OIDC bearer token on every sync. The `heph` application is a
**public** OAuth2 client using the **device-code flow** (RFC 8628), provisioned
in the [[authentik]] blueprint (`argocd/manifests/authentik/configmap-blueprint.yaml`):
- Issuer: `https://authentik.ops.eblu.me/application/o/heph/`
- Audience / client id: `heph`
- Restricted to the `admins` group (single-owner, sensitive data).
- Scope mappings: `openid`, `email`, `profile`, **`offline_access`**.
> **`offline_access` is required for durable sync.** The `heph` CLI requests
> `scope = "openid offline_access"`, and a refresh token is only issued for the
> 30-day refresh-token window when the provider actually grants `offline_access`.
> Without that scope mapping the refresh token is bound to the login **session**;
> once the session lapses, hephd's `refresh_token` grant returns `400 Bad
> Request`, the bearer can't be refreshed, and spoke sync silently degrades
> (`heph sync --status``auth_failure: true`). `heph auth login` papers over it
> until the next session expiry. Keep `offline_access` in the provider's
> `property_mappings`.
Because no Authentik instance ships a device-code flow by default, the blueprint
also creates `default-device-code-flow` and binds it to the default brand's
`flow_device_code`. Devices obtain a token with `heph auth login`; the PWA
currently takes a pasted token (in-app device-code login is upstream follow-up).
## Data seeding (Path A, one-time)
The hub was seeded from the existing `gilbert` device so no task history was
lost. heph's data-safe bring-up ("Path A") has the hub **adopt the device's
identity** rather than rewriting the device:
1. Quiesce the seed device: `heph daemon stop` (on gilbert).
2. Copy its store to indri: `scp ~/.local/share/heph/heph.db indri:~/.local/share/heph/heph.db`.
3. Give the hub its **own device origin** (keeps gilbert's `owner_id` + data;
`hephd` regenerates a fresh `origin` on next start when it is missing):
```fish
ssh indri "sqlite3 ~/.local/share/heph/heph.db \"DELETE FROM meta WHERE key='origin';\""
```
4. `mise run provision-indri -- --tags heph` (installs hephd, stages the PWA,
loads the launchagent → hub starts on the seeded store).
Only `meta.origin` changes; `owner_id`, nodes, op-log, and links are copied
untouched. A clean `hephd --owner-id` / seed command is tracked upstream as
hephaestus follow-up — until then this manual reset is the documented path.
## Connecting a spoke (e.g. gilbert)
A device joins by running its local daemon with the hub URL + OIDC client and
logging in once:
```bash
hephd --mode local --hub-url http://indri.tail8d86e.ts.net:8787 \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-client-id heph
heph auth login --hub-url http://indri.tail8d86e.ts.net:8787 \
--issuer https://authentik.ops.eblu.me/application/o/heph/ --client-id heph
```
> **Use the direct `http://…:8787` tailnet URL for sync, not the Caddy HTTPS
> URL.** hephd's sync client is plain-HTTP-only; pointing `--hub-url` at
> `https://heph.ops.eblu.me` fails with a confusing `error sending request`
> (the HTTP connector rejects the `https` scheme before connecting). Tailscale
> encrypts the transport, and the OIDC bearer token still gates every request.
> `heph.ops.eblu.me` (Caddy TLS) exists only for the browser PWA, which needs a
> secure context. The cached token is keyed by the exact `--hub-url`, so use the
> same value for `hephd` and `heph auth login`.
> **Caveat:** `heph daemon` cannot yet bake hub/spoke flags into the generated
> launchd plist (upstream gap). On a spoke whose plist is managed by `heph
> daemon`, the hub/OIDC flags must be hand-added — and a later `heph daemon
> start/restart` will regenerate the plist and drop them. Avoid `heph daemon`
> subcommands on a configured spoke until that gap is closed; reload via
> `launchctl` instead.
## Related
- [[indri]] — host
- [[authentik]] — OIDC provider
- [[caddy]] — TLS termination for `heph.ops.eblu.me`

View file

@ -1,6 +1,7 @@
--- ---
title: Ntfy title: Ntfy
modified: 2026-02-17 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- notifications - notifications
@ -17,7 +18,7 @@ Self-hosted push notification service. Ntfy receives HTTP POST messages and deli
| **URL** | https://ntfy.ops.eblu.me | | **URL** | https://ntfy.ops.eblu.me |
| **Tailscale URL** | https://ntfy.tail8d86e.ts.net | | **Tailscale URL** | https://ntfy.tail8d86e.ts.net |
| **Namespace** | `ntfy` | | **Namespace** | `ntfy` |
| **Image** | `binwiederhier/ntfy:v2.17.0` | | **Image** | `registry.ops.eblu.me/blumeops/ntfy:v2.19.2-fd0bebb-nix` (locally built) |
| **Upstream** | https://github.com/binwiederhier/ntfy | | **Upstream** | https://github.com/binwiederhier/ntfy |
| **Manifests** | `argocd/manifests/ntfy/` | | **Manifests** | `argocd/manifests/ntfy/` |

View file

@ -1,6 +1,7 @@
--- ---
title: Tempo title: Tempo
modified: 2026-03-05 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- observability - observability
@ -18,7 +19,7 @@ Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTL
| **Tailscale URL** | https://tempo.tail8d86e.ts.net | | **Tailscale URL** | https://tempo.tail8d86e.ts.net |
| **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net | | **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net |
| **Namespace** | `monitoring` | | **Namespace** | `monitoring` |
| **Image** | `grafana/tempo:2.10.1` | | **Image** | `registry.ops.eblu.me/blumeops/tempo:v2.10.3-75f9ba4` (locally built) |
| **Storage** | 10Gi PVC (local filesystem) | | **Storage** | 10Gi PVC (local filesystem) |
| **Retention** | 7 days | | **Retention** | 7 days |

View file

@ -56,8 +56,8 @@ services:
- name: nvidia-device-plugin - name: nvidia-device-plugin
type: argocd type: argocd
last-reviewed: 2026-03-27 last-reviewed: 2026-06-04
current-version: "v0.19.0" current-version: "v0.19.2"
upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases
notes: DaemonSet + RuntimeClass on ringtail for GPU workloads notes: DaemonSet + RuntimeClass on ringtail for GPU workloads
@ -159,10 +159,13 @@ services:
- name: external-secrets - name: external-secrets
type: argocd type: argocd
last-reviewed: 2026-03-25 last-reviewed: 2026-06-04
current-version: "v2.2.0" current-version: "v2.2.0"
upstream-source: https://github.com/external-secrets/external-secrets/releases upstream-source: https://github.com/external-secrets/external-secrets/releases
notes: Static kustomize manifests rendered from upstream Helm chart notes: >-
Static kustomize manifests rendered from upstream Helm chart. Controller
image is locally built from the forge mirror via containers/external-secrets/container.py
(single all_providers static Go binary).
- name: 1password-connect - name: 1password-connect
type: argocd type: argocd
@ -411,6 +414,23 @@ services:
upstream-source: https://github.com/caddyserver/caddy/releases upstream-source: https://github.com/caddyserver/caddy/releases
notes: Built from source with Gandi DNS and Layer 4 plugins notes: Built from source with Gandi DNS and Layer 4 plugins
- name: heph
type: ansible
last-reviewed: 2026-06-05
current-version: "v1.2.1"
upstream-source: https://forge.eblu.me/eblume/hephaestus/releases
notes: >-
hephaestus task/context sync hub on indri (server-mode launchagent,
ansible/roles/heph; cargo-built from the forge). SELF-UPDATING: hephd
polls the forge for newer releases every 10 min and rebuilds + restarts
itself, so the running version drifts AHEAD of the ansible heph_version
pin. current-version here is the last observed/deployed tag, not a hard
pin — verify the live version via `curl https://heph.ops.eblu.me/config`
is served (hub up) and the hub log's `current=` line. Reconciling this
self-update vs IaC-pin drift is tracked in the heph "Hephaestus" project:
"Reconcile hephd self-update with ansible-pinned version (drift on indri
hub)" (node 01KTBXWT6XTHNDH92CVJY88E5K).
- name: borgmatic - name: borgmatic
type: ansible type: ansible
last-reviewed: 2026-04-15 last-reviewed: 2026-04-15