Reconnect the socket client across daemon restarts (heph-tui survives self-update) #15

Merged

eblume merged 1 commit from feature/client-reconnect into main

2026-06-08 15:22:15 -07:00

eblume commented

2026-06-08 15:20:16 -07:00

Owner

Summary

hephd::Client opened the unix socket once at connect() and never reconnected — it didn't even keep the socket path. So when the daemon restarted (opt-in self-update, now at a 600s interval on gilbert, or heph daemon restart), the stream broke and every subsequent call() failed: heph-tui would sit on error: … until you quit and relaunched. The self-update + restart work shipped in v1.4.0 makes those restarts more frequent, so this hardening is the natural follow-on.

Client now stores the socket path and reconnects on a dropped connection, classifying the failure to stay correct:

write-side failure (request provably never reached the daemon) → reconnect + retry once, transparently;
reply lost after sending (daemon closed mid-request) → reconnect for the next call but surface this one with "reconnected — re-run the action if it didn't take effect", so a mutation is never silently double-applied;
genuine RPC errors pass through untouched (no needless reconnect).

heph-tui and the CLI use Client unchanged, so the TUI self-heals on its next refresh tick — no heph-tui code change, lower risk, and the CLI benefits too.

Testing

New tests/client_reconnect.rs: a mock daemon that closes the connection after each request; asserts the client recovers (within a call or two), the recovered call is served exactly once (no double-serve), and it keeps recovering across repeated drops.
cargo test --workspace green (incl. the existing 17 rpc_socket tests, unchanged); cargo clippy -p hephd --all-targets clean; fmt + prek pass.

Closes

01KTAKXZ… — "heph-tui: reconnect on a dropped daemon socket (survive hephd restarts / self-update)"

🤖 Generated with Claude Code

## Summary `hephd::Client` opened the unix socket once at `connect()` and never reconnected — it didn't even keep the socket path. So when the daemon restarted (opt-in self-update, now at a 600s interval on gilbert, or `heph daemon restart`), the stream broke and **every** subsequent `call()` failed: `heph-tui` would sit on `error: …` until you quit and relaunched. The self-update + restart work shipped in v1.4.0 makes those restarts more frequent, so this hardening is the natural follow-on. `Client` now stores the socket path and reconnects on a dropped connection, classifying the failure to stay correct: - **write-side failure** (request provably never reached the daemon) → reconnect + retry once, transparently; - **reply lost after sending** (daemon closed mid-request) → reconnect for the *next* call but surface this one with "reconnected — re-run the action if it didn't take effect", so a mutation is **never silently double-applied**; - **genuine RPC errors** pass through untouched (no needless reconnect). `heph-tui` and the CLI use `Client` unchanged, so the TUI **self-heals on its next refresh tick** — no heph-tui code change, lower risk, and the CLI benefits too. ## Testing - [x] New `tests/client_reconnect.rs`: a mock daemon that closes the connection after each request; asserts the client recovers (within a call or two), the recovered call is served exactly once (no double-serve), and it keeps recovering across repeated drops. - [x] `cargo test --workspace` green (incl. the existing 17 `rpc_socket` tests, unchanged); `cargo clippy -p hephd --all-targets` clean; fmt + prek pass. ## Closes `01KTAKXZ…` — "heph-tui: reconnect on a dropped daemon socket (survive hephd restarts / self-update)" 🤖 Generated with [Claude Code](https://claude.com/claude-code)

eblume added 1 commit

2026-06-08 15:20:16 -07:00

fix(hephd): reconnect the socket client across daemon restarts

Build / validate (pull_request) Successful in 8m7s

Details

b04a71421e

`Client` connected to the unix socket once and never reconnected, so after an
opt-in self-update or `heph daemon restart` dropped the socket, every later
`call()` failed — `heph-tui` would sit on errors until relaunched (and the work
we just shipped makes restarts more frequent).

`Client` now stores the socket path and reconnects on a dropped connection,
classifying the failure to stay safe:
- write-side failure (request never reached the daemon) → reconnect + retry once;
- reply lost after sending (daemon closed mid-request) → reconnect for next time
  but surface this one, so a mutation is never silently double-applied;
- genuine RPC errors are passed through untouched.

heph-tui and the CLI use `Client` unchanged, so the TUI self-heals on its next
refresh tick. Adds an integration test driving a mock daemon that drops the
connection after each request.

Closes the "heph-tui: reconnect on a dropped daemon socket" backlog task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>