Build / validate (pull_request) Successful in 8m7s

Details

fix(hephd): reconnect the socket client across daemon restarts

`Client` connected to the unix socket once and never reconnected, so after an
opt-in self-update or `heph daemon restart` dropped the socket, every later
`call()` failed — `heph-tui` would sit on errors until relaunched (and the work
we just shipped makes restarts more frequent).

`Client` now stores the socket path and reconnects on a dropped connection,
classifying the failure to stay safe:
- write-side failure (request never reached the daemon) → reconnect + retry once;
- reply lost after sending (daemon closed mid-request) → reconnect for next time
  but surface this one, so a mutation is never silently double-applied;
- genuine RPC errors are passed through untouched.

heph-tui and the CLI use `Client` unchanged, so the TUI self-heals on its next
refresh tick. Adds an integration test driving a mock daemon that drops the
connection after each request.

Closes the "heph-tui: reconnect on a dropped daemon socket" backlog task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-08 15:19:10 -07:00

5 KiB

Raw Permalink Blame History

title

modified

Run the heph daemon

heph and heph.nvim are thin clients — they talk to a hephd daemon over a unix socket and never start one themselves (design §4). Run the daemon as an OS-managed service with heph daemon:

heph daemon start      # install + start (idempotent)
heph daemon status     # is it installed/running? where are its socket/db/log?
heph daemon restart    # restart — run this after upgrading the binary
heph daemon stop       # stop it now
heph daemon uninstall  # stop and remove the service for good

All verbs are idempotent — start when it's already running is a no-op, stop when it's already stopped is fine.

What it manages

macOS — a launchd LaunchAgent (org.hephaestus.hephd) at ~/Library/LaunchAgents/org.hephaestus.hephd.plist, with RunAtLoad + KeepAlive (starts at login, restarts if it crashes).
Linux — a systemd user service (heph.service) at ~/.config/systemd/user/heph.service, with Restart=always, enabled for login.

Upgrading from an older install: earlier units used Restart=on-failure, which does not respawn after a clean exit — so opt-in self-update (which exits cleanly to hand off to the new binary) wouldn't come back on Linux. Run heph daemon restart once (it regenerates the unit) to pick up Restart=always.

By default it runs hephd --mode local against the default store (~/.local/share/heph/heph.db) and socket, with logs at ~/.local/share/heph/hephd.log. Pass flags to start/restart to bake a different runtime config into the service (see below).

stop vs uninstall: stop halts the daemon now, but the service is still installed, so on macOS it starts again at next login. Use uninstall to stop it persistently.

Baking sync config (spoke / hub)

By default the service runs a standalone --mode local daemon. To make the managed service a spoke (background-syncs to a hub) or a hub (--mode server), pass the corresponding hephd flags to start (or restart) — they get baked into the generated plist/unit:

# Spoke: sync to a hub, authenticating with OIDC
heph daemon start \
  --hub-url http://hub.example:8787 \
  --oidc-issuer https://idp.example/application/o/heph/ \
  --oidc-client-id heph

# Hub: expose the authenticated sync endpoint
heph daemon start --mode server \
  --http-addr 0.0.0.0:8787 \
  --oidc-issuer https://idp.example/application/o/heph/ \
  --oidc-audience heph

Bakeable flags: --mode, --hub-url, --http-addr, --oidc-issuer, --oidc-audience, --oidc-client-id, --self-update, --self-update-interval-secs. Regenerating preserves what's already baked in — start/restart read the existing service file and carry over any flags you don't pass, so a bare heph daemon restart never drops your spoke/hub or self-update config. Pass a flag again to add or override it.

Spoke sync is HTTP-only today (hephd's sync client doesn't speak HTTPS) — a --hub-url over the tailnet or behind a TLS-terminating proxy is the usual setup.

After upgrading

When you rebuild/reinstall (cargo install … --force), the running daemon is still the old binary until you restart it:

heph daemon restart

A restart (or an opt-in self-update) drops the daemon's unix socket out from under any connected surface. The CLI and heph-tui reconnect automatically: a read transparently retries on a fresh connection, and a long-running TUI self-heals on its next tick — so a daemon restart no longer leaves the agenda view stuck on errors. (A mutating action whose reply is lost mid-restart reports "reconnected — re-run the action if it didn't take effect" rather than risk applying twice.)

Self-update (opt-in)

hephd can keep itself current: heph daemon start --self-update generates a service that polls the forge for newer releases and, when one appears, rebuilds via cargo install (anonymous HTTPS clone of the public repo — no credentials) and restarts onto the new binary. It is off by default; the generated service also gets a PATH that can find cargo. Override the 6h poll cadence with --self-update-interval-secs <secs>. Both start and restart preserve an already-baked self-update setting (and its interval), so a bare invocation won't silently disable it — pass --self-update again only to turn it on later. Requires the Rust toolchain (cargo) installed for the service user.

Development isolation

heph daemon manages the installed daemon on the default paths. For in-repo development, run the working-tree daemon on separate paths instead and point a dev Neovim/CLI at it (never touches your real store):

mise run dev                                              # working-tree hephd on .dev/ paths
HEPH_SOCKET="$PWD/.dev/hephd.sock" HEPH_DB="$PWD/.dev/heph.db" nvim
HEPH_SOCKET="$PWD/.dev/hephd.sock" heph next

install-heph — install heph/hephd and the plugin
design — §4 the connect-only surface model

5 KiB Raw Permalink Blame History