hephaestus/docs/how-to/set-up-sync-hub.md
Erich Blume 11aa25c9f4
All checks were successful
Build / validate (pull_request) Successful in 6m11s
feat(heph-tui,hephd): surface sync health (last-sync age, conflicts, auth failure)
A spoke could be silently failing to sync (expired token → 401, or hub
unreachable) with the only signal buried in the daemon log. Now:

- hephd tracks SyncHealth (last attempt/success time, last error, auth-failure
  flag) from the background sync loop and sync.now, classifying a 401 as an auth
  failure. sync.status returns it plus the pending merge-conflict count.
- heph-tui shows a live status-line indicator (spoke only): '⟳ <age>' since the
  last good sync, red '⚠ auth' when re-login is needed, '⚠ offline' when the hub
  is unreachable, and '⚠ N conflicts' when conflicts are pending. The event loop
  polls on a 2s tick so the age advances and failures appear while idle.
- docs: recommended Authentik access/refresh token validity to stop frequent
  re-logins (with the iOS PWA localStorage-eviction caveat).

Closes the 'Add hub connection status to heph-tui' and 'Spoke sync health:
surface unhealthy state instead of silent 401 spam' backlog items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 10:19:11 -07:00

151 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Set up a sync hub (and connect a device)
modified: 2026-06-04
tags:
- how-to
---
# Set up a sync hub (and connect a device)
How to stand up the canonical **hub** (on `indri`, in blumeops) and connect an
existing **local** device (e.g. `gilbert`) to it as an offline-capable spoke,
**without migrating or risking the device's data**.
## The model
heph is **hub-and-spoke**, not a peer mesh ([[design]] §4, [[v1-prototype-tech-spec]] §3/§12/§13):
- **Hub** — `hephd --mode server`: a full replica that also exposes an HTTP
endpoint others sync against. One canonical hub (`indri`).
- **Spoke** — `hephd --mode local --hub-url <hub>`: its own full SQLite replica,
**fully usable offline**, with an append-only op-log; it background-syncs
(pull → merge → push) when the hub is reachable. Every device is a spoke.
Surfaces (CLI / TUI / nvim) only ever talk to the **local** daemon over the unix
socket; that daemon handles the hub conversation in the background.
**Transport vs. identity.** Tailscale gives the devices a secure private network
(reachability + encryption). **Authentik** sits on top as the authorization
layer: the hub requires a valid OIDC bearer token on every op exchange, so
merely being on the tailnet is not enough — this is the owner's most sensitive
data.
## The data-safety principle: the hub adopts the device, not the reverse
A device's `owner_id` is embedded in some node ids (journals, tags), the op-log,
and link rows. Rewriting it in place is the risky operation we **avoid**. Instead
(**"Path A"**): the hub takes on the *existing device's* identity — same
`owner_id` and data — so the device is **never rewritten**. `gilbert`'s store is
untouched; `indri` is brought up as a copy of it and the two sync forward.
> A device that is set up **after** the hub exists skips all of this: configure
> it with the hub + Authentik from first launch ("born authed"), before it
> creates data, and it simply joins.
## 1. Authentik: register the heph application
Create an OIDC/OAuth2 application + provider in Authentik for heph, configured
for the **device-code (RFC 8628) flow**. Note the values the daemon and devices
need:
- **Issuer** — e.g. `https://authentik.ops.eblu.me/application/o/heph/`
- **Client id** — the device-code client id (this is also the token *audience*).
### Token lifetime (avoid frequent re-logins)
Token lifetimes are set on the Authentik **provider**, not in heph — heph honors
whatever `expires_in` Authentik returns and silently refreshes using the
`offline_access` refresh token (both the CLI/daemon and the PWA do this). To
avoid re-authenticating often, set generous validities on the heph provider:
- **Access token validity** — e.g. `hours=24`. The hub validates `exp` and keeps
no revocation list, so this is the window in which a leaked token stays usable;
on a Tailscale-only hub, 2448h is a reasonable trade.
- **Refresh token validity** — e.g. `days=30`+. This is the setting that stops
the re-logins: while the refresh token is valid, the spoke **and** the PWA
renew silently with no browser round-trip. A short refresh window is the usual
cause of "I have to log in constantly".
> **iOS PWA caveat:** Safari can purge an *un-installed* PWA's `localStorage`
> (where its tokens live) after ~7 idle days regardless of these settings.
> Installing the app to the home screen mitigates it, but expect the occasional
> re-login on iOS.
## 2. Bring up the hub on `indri`
**Seed it from `gilbert` (Path A).** Quiesce `gilbert` (`heph daemon stop`),
copy its store to `indri`, and give `indri` its **own device origin** so the two
replicas don't share one (see *Current gaps* — this seeding step is the bit the
blumeops deployment finalizes). `indri` now holds `gilbert`'s data under the same
`owner_id`.
Run the hub with auth enabled (issuer **and** audience together turn auth on;
omit both only for local dev):
```bash
hephd --mode server \
--http-addr 0.0.0.0:8787 \
--db /var/lib/heph/heph.db \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-audience <heph-client-id>
```
The first identity to authenticate **claims** the hub's owner; thereafter only
that identity is served (single-owner today — see [[design]] and the
`Adoption + multi-tenant` task for the multi-tenancy seam).
## 3. Point `gilbert` at the hub (spoke)
Run `gilbert`'s daemon in local mode with the hub url + its OIDC client id, then
log in once (the device-code flow caches a bearer token in the OS keyring):
```bash
hephd --mode local \
--hub-url http://indri.<tailnet>.ts.net:8787 \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-client-id <heph-client-id>
# one-time browser login on this device:
heph auth login \
--hub-url http://indri.<tailnet>.ts.net:8787 \
--issuer https://authentik.ops.eblu.me/application/o/heph/ \
--client-id <heph-client-id>
```
The spoke now attaches the (auto-refreshing) bearer token to every hub request
and background-syncs on its interval.
## 4. Verify
```bash
heph sync --status # hub url, last push/pull cursors, sync health
heph sync # force a cycle now
```
`heph sync --status` also reports **sync health** — the time of the last
successful exchange, any last error, and whether the spoke is currently failing
to authenticate. The same signal is surfaced live in `heph-tui`'s status line
(last-sync age · pending conflicts · an auth-failure flag), so a silently-broken
spoke is visible at a glance rather than buried in the daemon log.
Make a change on `gilbert`, force a sync, and confirm it appears via the hub.
## Current gaps (finalized by the blumeops deployment)
The flag-level flow above works today; two enablers make it a clean, managed
deployment rather than a hand-run process — tracked in the `Hephaestus` project:
- **`heph daemon` only generates a `--mode local` service** (no `--hub-url` /
`--oidc-*`). So for now the hub and the spoke config are expressed as `hephd`
flags (run directly, or via the blumeops-managed systemd unit), not via
`heph daemon start`.
- **Path A seeding is manual** (copy the store + reset the device origin). A
small enabler — seed a hub from a snapshot with a fresh origin, or
`hephd --owner-id` — would make this one step.
## Related
- [[run-the-daemon]] — manage the local daemon as an OS service
- [[install-heph]] — install `heph`/`hephd` and the plugin
- [[design]] — §4 the connect-only, hub-and-spoke model
- [[v1-prototype-tech-spec]] — §3 runtime modes, §12 sync, §13 auth