diff --git a/CHANGELOG.md b/CHANGELOG.md index 9799e3f..1900ad9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). +## [v1.4.1] - 2026-06-08 + +### Bug Fixes + +- The `heph` CLI and `heph-tui` now survive a daemon restart. Previously the unix-socket client connected once and never reconnected, so an opt-in self-update or `heph daemon restart` left every subsequent call failing — `heph-tui` would sit on errors until relaunched. The client now reconnects on a dropped socket: a request that never went out is retried transparently, while a reply lost mid-request is surfaced (not silently retried) so a mutation is never double-applied. A long-running TUI self-heals on its next refresh tick. +- Quick-add popover (⌘'): hand keyboard focus back to the previously active app when it hides, and stop the (now invisible) overlay from intercepting clicks where it used to sit. + + +## [v1.4.0] - 2026-06-08 + +### Features + +- Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip. +- `heph daemon start`/`restart` can now bake the daemon's full runtime config into the managed service — `--mode`, `--hub-url`, `--http-addr`, `--oidc-issuer`/`--oidc-audience`/`--oidc-client-id`, and `--self-update-interval-secs` (previously only the bare `--self-update` bool was wired). Regenerating preserves whatever is already baked into the on-disk plist/unit, so a bare `start`/`restart` no longer silently drops spoke/hub or self-update config. +- heph-tui's sync indicator now shows the last-sync age in seconds under a minute (`⟳ 26s`) instead of a flat `just now`, so the chip reads as a live heartbeat and a missed sync (the loop runs every 30s) shows up as the age climbing. + +### Bug Fixes + +- hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a *rejection* — `identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication. +- `heph daemon restart` on macOS no longer intermittently fails with `launchctl bootstrap failed: 5: Input/output error`. The old code bootstrapped immediately after `bootout`, racing launchd's asynchronous teardown; it now waits for the service to fully unload and retries the bootstrap. When the plist is unchanged (e.g. a plain binary upgrade) it uses `launchctl kickstart -k` to restart the loaded job atomically, sidestepping the bootout→bootstrap dance entirely. + + ## [v1.2.3] - 2026-06-06 ### Features diff --git a/Cargo.lock b/Cargo.lock index be8f974..cc9b3a6 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -2237,6 +2237,8 @@ dependencies = [ "heph-core", "hephd", "libc", + "objc2 0.6.4", + "objc2-app-kit 0.3.2", "serde_json", "winit", ] diff --git a/crates/heph-quickadd/Cargo.toml b/crates/heph-quickadd/Cargo.toml index 5b1889b..57bbb98 100644 --- a/crates/heph-quickadd/Cargo.toml +++ b/crates/heph-quickadd/Cargo.toml @@ -19,7 +19,16 @@ global-hotkey = "0.8" # macOS-only: winit for the accessory-mode activation policy (no Dock icon), # pinned to the same minor eframe carries so cargo unifies to one winit; libc -# for getppid() (orphan detection — self-exit when the supervising daemon dies). +# for getppid() (orphan detection — self-exit when the supervising daemon dies); +# objc2 + objc2-app-kit to hand keyboard focus back to the previously active app +# when the popover hides (NSApplication.hide:/unhide:). Pinned to the 0.6/0.3 +# line global-hotkey already pulls in, so cargo unifies to one copy. [target.'cfg(target_os = "macos")'.dependencies] winit = "0.30" libc = "0.2" +objc2 = "0.6" +objc2-app-kit = { version = "0.3", default-features = false, features = [ + "std", + "NSApplication", + "NSResponder", +] } diff --git a/crates/heph-quickadd/src/app.rs b/crates/heph-quickadd/src/app.rs index b08bf03..a334b22 100644 --- a/crates/heph-quickadd/src/app.rs +++ b/crates/heph-quickadd/src/app.rs @@ -226,6 +226,9 @@ impl QuickAdd { } fn show(&mut self, ctx: &egui::Context) { + // Undo the app-level hide from the previous `hide()` so we can take focus + // again (no-op the first time / off macOS). + app_take_focus(); self.visible = true; self.focus_pending = true; self.current_hint = random_hint(self.current_hint); @@ -256,6 +259,13 @@ impl QuickAdd { ctx.send_viewport_cmd(egui::ViewportCommand::InnerSize(egui::vec2(WIN_W, BASE_H))); self.win_h_applied = BASE_H; } + // Hand keyboard focus back to the app underneath us. winit's + // `Visible(false)` alone leaves *us* the active application, so focus + // never returns and the borderless always-on-top overlay can keep eating + // clicks where it used to sit. `NSApplication.hide:` orders our windows + // fully out and activates the next app in line — exactly the one the user + // was in (no-op off macOS). + app_yield_focus(); } /// Optimistic submit: hide now, create in the background. @@ -596,6 +606,39 @@ impl QuickAdd { } } +/// Hide the popover at the *application* level so macOS hands keyboard focus +/// back to the previously active app. `NSApplication.hide:` orders all our +/// windows out and activates the next app in line — the one the user was in — +/// which a plain winit `Visible(false)` does not do. No-op off macOS. +#[cfg(target_os = "macos")] +fn app_yield_focus() { + use objc2::MainThreadMarker; + use objc2_app_kit::NSApplication; + // eframe's `update` runs on the main thread, so this marker is always Some. + if let Some(mtm) = MainThreadMarker::new() { + NSApplication::sharedApplication(mtm).hide(None); + } +} + +#[cfg(not(target_os = "macos"))] +fn app_yield_focus() {} + +/// Undo [`app_yield_focus`]: clear the app-level hidden flag before re-showing, +/// so the window the viewport `Focus` command then makes key actually appears. +/// (`unhide:` also re-activates us; the per-window `Focus`/`Visible` viewport +/// commands do the rest.) No-op off macOS. +#[cfg(target_os = "macos")] +fn app_take_focus() { + use objc2::MainThreadMarker; + use objc2_app_kit::NSApplication; + if let Some(mtm) = MainThreadMarker::new() { + NSApplication::sharedApplication(mtm).unhide(None); + } +} + +#[cfg(not(target_os = "macos"))] +fn app_take_focus() {} + /// The current parent process id, for orphan detection. `None` off macOS (where /// hephd does not supervise a helper — there is no Aqua session to inherit). fn current_parent_pid() -> Option { diff --git a/crates/hephd/src/client.rs b/crates/hephd/src/client.rs index c3c008b..8a2bd5d 100644 --- a/crates/hephd/src/client.rs +++ b/crates/hephd/src/client.rs @@ -2,59 +2,145 @@ //! //! Used by the `heph` CLI and by tests. Surfaces never touch SQLite directly //! (tech-spec §3) — they go through the daemon socket, which this wraps. +//! +//! The connection self-heals across daemon restarts (opt-in self-update, `heph +//! daemon restart`): a [`call`](Client::call) that finds the socket dropped +//! reconnects. It only auto-retries when the request provably never reached the +//! daemon (a write-side failure); a reply lost *after* sending is surfaced +//! rather than retried, so a mutation is never silently double-applied. use std::io::{BufRead, BufReader, Write}; use std::os::unix::net::UnixStream; -use std::path::Path; +use std::path::{Path, PathBuf}; -use anyhow::{bail, Context, Result}; +use anyhow::{anyhow, Context, Result}; use serde_json::{json, Value}; use crate::rpc::Response; /// A connected client. One request/response per [`call`](Client::call). pub struct Client { + socket_path: PathBuf, reader: BufReader, writer: UnixStream, next_id: u64, } +/// How a single request/response exchange failed — drives the retry decision. +enum ExchangeError { + /// The request could not be written (broken pipe, reset): it never reached + /// the daemon, so retrying on a fresh connection is safe. + Send(anyhow::Error), + /// The request was sent but no reply came back (the daemon closed mid-flight, + /// e.g. it restarted): it may or may not have applied — do not retry. + Recv(anyhow::Error), + /// A well-formed RPC-level error (or an unparseable reply): the connection is + /// fine; nothing to reconnect. + Rpc(anyhow::Error), +} + +impl ExchangeError { + fn into_inner(self) -> anyhow::Error { + match self { + ExchangeError::Send(e) | ExchangeError::Recv(e) | ExchangeError::Rpc(e) => e, + } + } +} + impl Client { /// Connect to a daemon listening at `socket_path`. pub fn connect(socket_path: &Path) -> Result { - let stream = UnixStream::connect(socket_path) - .with_context(|| format!("connecting to hephd at {}", socket_path.display()))?; - let reader = BufReader::new(stream.try_clone()?); + let (reader, writer) = Self::open(socket_path)?; Ok(Client { + socket_path: socket_path.to_path_buf(), reader, - writer: stream, + writer, next_id: 1, }) } + /// Open a fresh reader/writer pair on the socket. + fn open(socket_path: &Path) -> Result<(BufReader, UnixStream)> { + let stream = UnixStream::connect(socket_path) + .with_context(|| format!("connecting to hephd at {}", socket_path.display()))?; + let reader = BufReader::new(stream.try_clone()?); + Ok((reader, stream)) + } + + /// Re-establish the connection (after the daemon restarted and dropped it). + fn reconnect(&mut self) -> Result<()> { + let (reader, writer) = Self::open(&self.socket_path)?; + self.reader = reader; + self.writer = writer; + Ok(()) + } + /// Call `method` with `params`, returning the `result` value (or an error /// carrying the RPC error's code and message). + /// + /// If the daemon has restarted and dropped the socket, this reconnects: it + /// retries transparently when the request never went out, and otherwise + /// reconnects for the next call while surfacing an error for this one (so a + /// mutation whose reply was lost is not silently re-applied). pub fn call(&mut self, method: &str, params: Value) -> Result { let id = self.next_id; self.next_id += 1; - let mut line = serde_json::to_string(&json!({ "id": id, "method": method, "params": params, }))?; line.push('\n'); - self.writer.write_all(line.as_bytes())?; - self.writer.flush()?; + + match self.exchange(&line) { + Ok(v) => Ok(v), + Err(ExchangeError::Rpc(e)) => Err(e), + Err(ExchangeError::Send(_)) => { + // The request never reached the daemon — reconnect and retry once. + self.reconnect() + .context("hephd connection lost and reconnect failed")?; + self.exchange(&line) + .map_err(ExchangeError::into_inner) + .with_context(|| format!("retrying `{method}` after reconnect")) + } + Err(ExchangeError::Recv(e)) => { + // Sent but no reply: the daemon likely restarted mid-request. Don't + // retry (a mutation may have applied); reconnect for next time and + // surface this one. + let _ = self.reconnect(); + Err(e).context( + "hephd closed the connection mid-request (it likely restarted); \ + reconnected — re-run the action if it didn't take effect", + ) + } + } + } + + /// One request/response over the current connection, classifying failures. + fn exchange(&mut self, line: &str) -> std::result::Result { + self.writer + .write_all(line.as_bytes()) + .map_err(|e| ExchangeError::Send(e.into()))?; + self.writer + .flush() + .map_err(|e| ExchangeError::Send(e.into()))?; let mut response_line = String::new(); - let read = self.reader.read_line(&mut response_line)?; + let read = self + .reader + .read_line(&mut response_line) + .map_err(|e| ExchangeError::Recv(e.into()))?; if read == 0 { - bail!("hephd closed the connection"); + return Err(ExchangeError::Recv(anyhow!("hephd closed the connection"))); } - let response: Response = serde_json::from_str(&response_line)?; + let response: Response = + serde_json::from_str(&response_line).map_err(|e| ExchangeError::Rpc(e.into()))?; if let Some(err) = response.error { - bail!("rpc error {}: {}", err.code, err.message); + return Err(ExchangeError::Rpc(anyhow!( + "rpc error {}: {}", + err.code, + err.message + ))); } Ok(response.result.unwrap_or(Value::Null)) } diff --git a/crates/hephd/tests/client_reconnect.rs b/crates/hephd/tests/client_reconnect.rs new file mode 100644 index 0000000..a4d0074 --- /dev/null +++ b/crates/hephd/tests/client_reconnect.rs @@ -0,0 +1,96 @@ +//! [`Client`] survives the daemon dropping the socket (opt-in self-update, `heph +//! daemon restart`). A mock daemon serves exactly one request per connection +//! then closes it, forcing the client to reconnect — without auto-reconnect, +//! every call after the first would fail forever. + +use std::io::{BufRead, BufReader, Write}; +use std::os::unix::net::UnixListener; +use std::path::PathBuf; +use std::sync::atomic::{AtomicUsize, Ordering}; +use std::sync::Arc; +use std::thread; +use std::time::Duration; + +use hephd::Client; +use serde_json::{json, Value}; + +/// A mock daemon that handles ONE request per connection then closes it, looping +/// to accept the next connection. `served` counts total requests answered. +fn spawn_one_shot_daemon(socket: PathBuf, served: Arc) { + thread::spawn(move || { + let listener = UnixListener::bind(&socket).unwrap(); + for conn in listener.incoming() { + let Ok(mut stream) = conn else { continue }; + let mut reader = BufReader::new(stream.try_clone().unwrap()); + let mut line = String::new(); + if reader.read_line(&mut line).unwrap_or(0) == 0 { + continue; // client opened then went away; wait for the next one + } + let req: Value = serde_json::from_str(&line).unwrap(); + let n = served.fetch_add(1, Ordering::SeqCst) + 1; + let mut out = serde_json::to_string(&json!({ + "id": req["id"], + "result": { "served": n }, + })) + .unwrap(); + out.push('\n'); + let _ = stream.write_all(out.as_bytes()); + let _ = stream.flush(); + // `stream` drops here → the connection closes after one request. + } + }); +} + +fn wait_for(socket: &std::path::Path) { + for _ in 0..400 { + if socket.exists() { + return; + } + thread::sleep(Duration::from_millis(5)); + } + panic!("mock daemon socket never appeared"); +} + +#[test] +fn client_reconnects_after_the_daemon_drops_the_socket() { + let dir = tempfile::tempdir().unwrap(); + let socket = dir.path().join("d.sock"); + let served = Arc::new(AtomicUsize::new(0)); + spawn_one_shot_daemon(socket.clone(), served.clone()); + wait_for(&socket); + + let mut c = Client::connect(&socket).unwrap(); + + // First call works on the initial connection. + let r1 = c.call("ping", json!({})).unwrap(); + assert_eq!(r1["served"], 1); + + // The daemon has now closed that connection. With reconnect, the client + // recovers within a call or two (depending on whether the dead socket fails + // on write or on read); without it, every further call would fail forever. + let mut recovered = None; + for _ in 0..2 { + if let Ok(v) = c.call("ping", json!({})) { + recovered = Some(v); + break; + } + } + let r = recovered.expect("client should reconnect after the socket was dropped"); + // The recovered call was served exactly once on the new connection — no + // double-serve from a spurious retry. + assert_eq!(r["served"], 2); + assert_eq!(served.load(Ordering::SeqCst), 2); + + // And it keeps working across subsequent drops. + let r3 = { + let mut got = None; + for _ in 0..2 { + if let Ok(v) = c.call("ping", json!({})) { + got = Some(v); + break; + } + } + got.expect("client should keep reconnecting") + }; + assert_eq!(r3["served"], 3); +} diff --git a/docs/changelog.d/+sync-age-seconds.feature.md b/docs/changelog.d/+sync-age-seconds.feature.md deleted file mode 100644 index cf453c2..0000000 --- a/docs/changelog.d/+sync-age-seconds.feature.md +++ /dev/null @@ -1 +0,0 @@ -heph-tui's sync indicator now shows the last-sync age in seconds under a minute (`⟳ 26s`) instead of a flat `just now`, so the chip reads as a live heartbeat and a missed sync (the loop runs every 30s) shows up as the age climbing. diff --git a/docs/changelog.d/auth-error-clarity.bugfix.md b/docs/changelog.d/auth-error-clarity.bugfix.md deleted file mode 100644 index 83ba854..0000000 --- a/docs/changelog.d/auth-error-clarity.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a *rejection* — `identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication. diff --git a/docs/changelog.d/auth-error-clarity.feature.md b/docs/changelog.d/auth-error-clarity.feature.md deleted file mode 100644 index ab67867..0000000 --- a/docs/changelog.d/auth-error-clarity.feature.md +++ /dev/null @@ -1 +0,0 @@ -Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip. diff --git a/docs/changelog.d/daemon-restart-race.bugfix.md b/docs/changelog.d/daemon-restart-race.bugfix.md deleted file mode 100644 index c13a257..0000000 --- a/docs/changelog.d/daemon-restart-race.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -`heph daemon restart` on macOS no longer intermittently fails with `launchctl bootstrap failed: 5: Input/output error`. The old code bootstrapped immediately after `bootout`, racing launchd's asynchronous teardown; it now waits for the service to fully unload and retries the bootstrap. When the plist is unchanged (e.g. a plain binary upgrade) it uses `launchctl kickstart -k` to restart the loaded job atomically, sidestepping the bootout→bootstrap dance entirely. diff --git a/docs/changelog.d/daemon-self-update-interval.feature.md b/docs/changelog.d/daemon-self-update-interval.feature.md deleted file mode 100644 index b5ec9b8..0000000 --- a/docs/changelog.d/daemon-self-update-interval.feature.md +++ /dev/null @@ -1 +0,0 @@ -`heph daemon start`/`restart` can now bake the daemon's full runtime config into the managed service — `--mode`, `--hub-url`, `--http-addr`, `--oidc-issuer`/`--oidc-audience`/`--oidc-client-id`, and `--self-update-interval-secs` (previously only the bare `--self-update` bool was wired). Regenerating preserves whatever is already baked into the on-disk plist/unit, so a bare `start`/`restart` no longer silently drops spoke/hub or self-update config. diff --git a/docs/how-to/run-the-daemon.md b/docs/how-to/run-the-daemon.md index cb9e56d..545b3be 100644 --- a/docs/how-to/run-the-daemon.md +++ b/docs/how-to/run-the-daemon.md @@ -86,6 +86,14 @@ still the old binary until you restart it: heph daemon restart ``` +A restart (or an opt-in self-update) drops the daemon's unix socket out from +under any connected surface. The CLI and `heph-tui` **reconnect automatically**: +a read transparently retries on a fresh connection, and a long-running TUI +self-heals on its next tick — so a daemon restart no longer leaves the agenda +view stuck on errors. (A mutating action whose reply is lost mid-restart reports +"reconnected — re-run the action if it didn't take effect" rather than risk +applying twice.) + ## Self-update (opt-in) `hephd` can keep itself current: `heph daemon start --self-update` generates a