A spoke could be silently failing to sync (expired token → 401, or hub unreachable) with the only signal buried in the daemon log. Now: - hephd tracks SyncHealth (last attempt/success time, last error, auth-failure flag) from the background sync loop and sync.now, classifying a 401 as an auth failure. sync.status returns it plus the pending merge-conflict count. - heph-tui shows a live status-line indicator (spoke only): '⟳ <age>' since the last good sync, red '⚠ auth' when re-login is needed, '⚠ offline' when the hub is unreachable, and '⚠ N conflicts' when conflicts are pending. The event loop polls on a 2s tick so the age advances and failures appear while idle. - docs: recommended Authentik access/refresh token validity to stop frequent re-logins (with the iOS PWA localStorage-eviction caveat). Closes the 'Add hub connection status to heph-tui' and 'Spoke sync health: surface unhealthy state instead of silent 401 spam' backlog items. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.5 KiB
| title | modified | tags | |
|---|---|---|---|
| Set up a sync hub (and connect a device) | 2026-06-04 |
|
Set up a sync hub (and connect a device)
How to stand up the canonical hub (on indri, in blumeops) and connect an
existing local device (e.g. gilbert) to it as an offline-capable spoke,
without migrating or risking the device's data.
The model
heph is hub-and-spoke, not a peer mesh (design §4, v1-prototype-tech-spec §3/§12/§13):
- Hub —
hephd --mode server: a full replica that also exposes an HTTP endpoint others sync against. One canonical hub (indri). - Spoke —
hephd --mode local --hub-url <hub>: its own full SQLite replica, fully usable offline, with an append-only op-log; it background-syncs (pull → merge → push) when the hub is reachable. Every device is a spoke.
Surfaces (CLI / TUI / nvim) only ever talk to the local daemon over the unix socket; that daemon handles the hub conversation in the background.
Transport vs. identity. Tailscale gives the devices a secure private network (reachability + encryption). Authentik sits on top as the authorization layer: the hub requires a valid OIDC bearer token on every op exchange, so merely being on the tailnet is not enough — this is the owner's most sensitive data.
The data-safety principle: the hub adopts the device, not the reverse
A device's owner_id is embedded in some node ids (journals, tags), the op-log,
and link rows. Rewriting it in place is the risky operation we avoid. Instead
("Path A"): the hub takes on the existing device's identity — same
owner_id and data — so the device is never rewritten. gilbert's store is
untouched; indri is brought up as a copy of it and the two sync forward.
A device that is set up after the hub exists skips all of this: configure it with the hub + Authentik from first launch ("born authed"), before it creates data, and it simply joins.
1. Authentik: register the heph application
Create an OIDC/OAuth2 application + provider in Authentik for heph, configured for the device-code (RFC 8628) flow. Note the values the daemon and devices need:
- Issuer — e.g.
https://authentik.ops.eblu.me/application/o/heph/ - Client id — the device-code client id (this is also the token audience).
Token lifetime (avoid frequent re-logins)
Token lifetimes are set on the Authentik provider, not in heph — heph honors
whatever expires_in Authentik returns and silently refreshes using the
offline_access refresh token (both the CLI/daemon and the PWA do this). To
avoid re-authenticating often, set generous validities on the heph provider:
- Access token validity — e.g.
hours=24. The hub validatesexpand keeps no revocation list, so this is the window in which a leaked token stays usable; on a Tailscale-only hub, 24–48h is a reasonable trade. - Refresh token validity — e.g.
days=30+. This is the setting that stops the re-logins: while the refresh token is valid, the spoke and the PWA renew silently with no browser round-trip. A short refresh window is the usual cause of "I have to log in constantly".
iOS PWA caveat: Safari can purge an un-installed PWA's
localStorage(where its tokens live) after ~7 idle days regardless of these settings. Installing the app to the home screen mitigates it, but expect the occasional re-login on iOS.
2. Bring up the hub on indri
Seed it from gilbert (Path A). Quiesce gilbert (heph daemon stop),
copy its store to indri, and give indri its own device origin so the two
replicas don't share one (see Current gaps — this seeding step is the bit the
blumeops deployment finalizes). indri now holds gilbert's data under the same
owner_id.
Run the hub with auth enabled (issuer and audience together turn auth on; omit both only for local dev):
hephd --mode server \
--http-addr 0.0.0.0:8787 \
--db /var/lib/heph/heph.db \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-audience <heph-client-id>
The first identity to authenticate claims the hub's owner; thereafter only
that identity is served (single-owner today — see design and the
Adoption + multi-tenant task for the multi-tenancy seam).
3. Point gilbert at the hub (spoke)
Run gilbert's daemon in local mode with the hub url + its OIDC client id, then
log in once (the device-code flow caches a bearer token in the OS keyring):
hephd --mode local \
--hub-url http://indri.<tailnet>.ts.net:8787 \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-client-id <heph-client-id>
# one-time browser login on this device:
heph auth login \
--hub-url http://indri.<tailnet>.ts.net:8787 \
--issuer https://authentik.ops.eblu.me/application/o/heph/ \
--client-id <heph-client-id>
The spoke now attaches the (auto-refreshing) bearer token to every hub request and background-syncs on its interval.
4. Verify
heph sync --status # hub url, last push/pull cursors, sync health
heph sync # force a cycle now
heph sync --status also reports sync health — the time of the last
successful exchange, any last error, and whether the spoke is currently failing
to authenticate. The same signal is surfaced live in heph-tui's status line
(last-sync age · pending conflicts · an auth-failure flag), so a silently-broken
spoke is visible at a glance rather than buried in the daemon log.
Make a change on gilbert, force a sync, and confirm it appears via the hub.
Current gaps (finalized by the blumeops deployment)
The flag-level flow above works today; two enablers make it a clean, managed
deployment rather than a hand-run process — tracked in the Hephaestus project:
heph daemononly generates a--mode localservice (no--hub-url/--oidc-*). So for now the hub and the spoke config are expressed ashephdflags (run directly, or via the blumeops-managed systemd unit), not viaheph daemon start.- Path A seeding is manual (copy the store + reset the device origin). A
small enabler — seed a hub from a snapshot with a fresh origin, or
hephd --owner-id— would make this one step.
Related
- run-the-daemon — manage the local daemon as an OS service
- install-heph — install
heph/hephdand the plugin - design — §4 the connect-only, hub-and-spoke model
- v1-prototype-tech-spec — §3 runtime modes, §12 sync, §13 auth