hephaestus/docs/how-to/set-up-sync-hub.md
Erich Blume 11aa25c9f4
All checks were successful
Build / validate (pull_request) Successful in 6m11s
feat(heph-tui,hephd): surface sync health (last-sync age, conflicts, auth failure)
A spoke could be silently failing to sync (expired token → 401, or hub
unreachable) with the only signal buried in the daemon log. Now:

- hephd tracks SyncHealth (last attempt/success time, last error, auth-failure
  flag) from the background sync loop and sync.now, classifying a 401 as an auth
  failure. sync.status returns it plus the pending merge-conflict count.
- heph-tui shows a live status-line indicator (spoke only): '⟳ <age>' since the
  last good sync, red '⚠ auth' when re-login is needed, '⚠ offline' when the hub
  is unreachable, and '⚠ N conflicts' when conflicts are pending. The event loop
  polls on a 2s tick so the age advances and failures appear while idle.
- docs: recommended Authentik access/refresh token validity to stop frequent
  re-logins (with the iOS PWA localStorage-eviction caveat).

Closes the 'Add hub connection status to heph-tui' and 'Spoke sync health:
surface unhealthy state instead of silent 401 spam' backlog items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 10:19:11 -07:00

6.5 KiB
Raw Blame History

title modified tags
Set up a sync hub (and connect a device) 2026-06-04
how-to

Set up a sync hub (and connect a device)

How to stand up the canonical hub (on indri, in blumeops) and connect an existing local device (e.g. gilbert) to it as an offline-capable spoke, without migrating or risking the device's data.

The model

heph is hub-and-spoke, not a peer mesh (design §4, v1-prototype-tech-spec §3/§12/§13):

  • Hubhephd --mode server: a full replica that also exposes an HTTP endpoint others sync against. One canonical hub (indri).
  • Spokehephd --mode local --hub-url <hub>: its own full SQLite replica, fully usable offline, with an append-only op-log; it background-syncs (pull → merge → push) when the hub is reachable. Every device is a spoke.

Surfaces (CLI / TUI / nvim) only ever talk to the local daemon over the unix socket; that daemon handles the hub conversation in the background.

Transport vs. identity. Tailscale gives the devices a secure private network (reachability + encryption). Authentik sits on top as the authorization layer: the hub requires a valid OIDC bearer token on every op exchange, so merely being on the tailnet is not enough — this is the owner's most sensitive data.

The data-safety principle: the hub adopts the device, not the reverse

A device's owner_id is embedded in some node ids (journals, tags), the op-log, and link rows. Rewriting it in place is the risky operation we avoid. Instead ("Path A"): the hub takes on the existing device's identity — same owner_id and data — so the device is never rewritten. gilbert's store is untouched; indri is brought up as a copy of it and the two sync forward.

A device that is set up after the hub exists skips all of this: configure it with the hub + Authentik from first launch ("born authed"), before it creates data, and it simply joins.

1. Authentik: register the heph application

Create an OIDC/OAuth2 application + provider in Authentik for heph, configured for the device-code (RFC 8628) flow. Note the values the daemon and devices need:

  • Issuer — e.g. https://authentik.ops.eblu.me/application/o/heph/
  • Client id — the device-code client id (this is also the token audience).

Token lifetime (avoid frequent re-logins)

Token lifetimes are set on the Authentik provider, not in heph — heph honors whatever expires_in Authentik returns and silently refreshes using the offline_access refresh token (both the CLI/daemon and the PWA do this). To avoid re-authenticating often, set generous validities on the heph provider:

  • Access token validity — e.g. hours=24. The hub validates exp and keeps no revocation list, so this is the window in which a leaked token stays usable; on a Tailscale-only hub, 2448h is a reasonable trade.
  • Refresh token validity — e.g. days=30+. This is the setting that stops the re-logins: while the refresh token is valid, the spoke and the PWA renew silently with no browser round-trip. A short refresh window is the usual cause of "I have to log in constantly".

iOS PWA caveat: Safari can purge an un-installed PWA's localStorage (where its tokens live) after ~7 idle days regardless of these settings. Installing the app to the home screen mitigates it, but expect the occasional re-login on iOS.

2. Bring up the hub on indri

Seed it from gilbert (Path A). Quiesce gilbert (heph daemon stop), copy its store to indri, and give indri its own device origin so the two replicas don't share one (see Current gaps — this seeding step is the bit the blumeops deployment finalizes). indri now holds gilbert's data under the same owner_id.

Run the hub with auth enabled (issuer and audience together turn auth on; omit both only for local dev):

hephd --mode server \
  --http-addr 0.0.0.0:8787 \
  --db /var/lib/heph/heph.db \
  --oidc-issuer  https://authentik.ops.eblu.me/application/o/heph/ \
  --oidc-audience <heph-client-id>

The first identity to authenticate claims the hub's owner; thereafter only that identity is served (single-owner today — see design and the Adoption + multi-tenant task for the multi-tenancy seam).

3. Point gilbert at the hub (spoke)

Run gilbert's daemon in local mode with the hub url + its OIDC client id, then log in once (the device-code flow caches a bearer token in the OS keyring):

hephd --mode local \
  --hub-url http://indri.<tailnet>.ts.net:8787 \
  --oidc-issuer    https://authentik.ops.eblu.me/application/o/heph/ \
  --oidc-client-id <heph-client-id>

# one-time browser login on this device:
heph auth login \
  --hub-url   http://indri.<tailnet>.ts.net:8787 \
  --issuer    https://authentik.ops.eblu.me/application/o/heph/ \
  --client-id <heph-client-id>

The spoke now attaches the (auto-refreshing) bearer token to every hub request and background-syncs on its interval.

4. Verify

heph sync --status     # hub url, last push/pull cursors, sync health
heph sync              # force a cycle now

heph sync --status also reports sync health — the time of the last successful exchange, any last error, and whether the spoke is currently failing to authenticate. The same signal is surfaced live in heph-tui's status line (last-sync age · pending conflicts · an auth-failure flag), so a silently-broken spoke is visible at a glance rather than buried in the daemon log.

Make a change on gilbert, force a sync, and confirm it appears via the hub.

Current gaps (finalized by the blumeops deployment)

The flag-level flow above works today; two enablers make it a clean, managed deployment rather than a hand-run process — tracked in the Hephaestus project:

  • heph daemon only generates a --mode local service (no --hub-url / --oidc-*). So for now the hub and the spoke config are expressed as hephd flags (run directly, or via the blumeops-managed systemd unit), not via heph daemon start.
  • Path A seeding is manual (copy the store + reset the device origin). A small enabler — seed a hub from a snapshot with a fresh origin, or hephd --owner-id — would make this one step.