Compare commits

..

No commits in common. "main" and "feature/daemon-self-update-interval" have entirely different histories.

18 changed files with 99 additions and 733 deletions

View file

@ -12,28 +12,6 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
<!-- towncrier release notes start --> <!-- towncrier release notes start -->
## [v1.4.1] - 2026-06-08
### Bug Fixes
- The `heph` CLI and `heph-tui` now survive a daemon restart. Previously the unix-socket client connected once and never reconnected, so an opt-in self-update or `heph daemon restart` left every subsequent call failing — `heph-tui` would sit on errors until relaunched. The client now reconnects on a dropped socket: a request that never went out is retried transparently, while a reply lost mid-request is surfaced (not silently retried) so a mutation is never double-applied. A long-running TUI self-heals on its next refresh tick.
- Quick-add popover (⌘'): hand keyboard focus back to the previously active app when it hides, and stop the (now invisible) overlay from intercepting clicks where it used to sit.
## [v1.4.0] - 2026-06-08
### Features
- Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip.
- `heph daemon start`/`restart` can now bake the daemon's full runtime config into the managed service — `--mode`, `--hub-url`, `--http-addr`, `--oidc-issuer`/`--oidc-audience`/`--oidc-client-id`, and `--self-update-interval-secs` (previously only the bare `--self-update` bool was wired). Regenerating preserves whatever is already baked into the on-disk plist/unit, so a bare `start`/`restart` no longer silently drops spoke/hub or self-update config.
- heph-tui's sync indicator now shows the last-sync age in seconds under a minute (`⟳ 26s`) instead of a flat `just now`, so the chip reads as a live heartbeat and a missed sync (the loop runs every 30s) shows up as the age climbing.
### Bug Fixes
- hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a *rejection*`identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication.
- `heph daemon restart` on macOS no longer intermittently fails with `launchctl bootstrap failed: 5: Input/output error`. The old code bootstrapped immediately after `bootout`, racing launchd's asynchronous teardown; it now waits for the service to fully unload and retries the bootstrap. When the plist is unchanged (e.g. a plain binary upgrade) it uses `launchctl kickstart -k` to restart the loaded job atomically, sidestepping the bootout→bootstrap dance entirely.
## [v1.2.3] - 2026-06-06 ## [v1.2.3] - 2026-06-06
### Features ### Features

2
Cargo.lock generated
View file

@ -2237,8 +2237,6 @@ dependencies = [
"heph-core", "heph-core",
"hephd", "hephd",
"libc", "libc",
"objc2 0.6.4",
"objc2-app-kit 0.3.2",
"serde_json", "serde_json",
"winit", "winit",
] ]

View file

@ -19,16 +19,7 @@ global-hotkey = "0.8"
# macOS-only: winit for the accessory-mode activation policy (no Dock icon), # macOS-only: winit for the accessory-mode activation policy (no Dock icon),
# pinned to the same minor eframe carries so cargo unifies to one winit; libc # pinned to the same minor eframe carries so cargo unifies to one winit; libc
# for getppid() (orphan detection — self-exit when the supervising daemon dies); # for getppid() (orphan detection — self-exit when the supervising daemon dies).
# objc2 + objc2-app-kit to hand keyboard focus back to the previously active app
# when the popover hides (NSApplication.hide:/unhide:). Pinned to the 0.6/0.3
# line global-hotkey already pulls in, so cargo unifies to one copy.
[target.'cfg(target_os = "macos")'.dependencies] [target.'cfg(target_os = "macos")'.dependencies]
winit = "0.30" winit = "0.30"
libc = "0.2" libc = "0.2"
objc2 = "0.6"
objc2-app-kit = { version = "0.3", default-features = false, features = [
"std",
"NSApplication",
"NSResponder",
] }

View file

@ -226,9 +226,6 @@ impl QuickAdd {
} }
fn show(&mut self, ctx: &egui::Context) { fn show(&mut self, ctx: &egui::Context) {
// Undo the app-level hide from the previous `hide()` so we can take focus
// again (no-op the first time / off macOS).
app_take_focus();
self.visible = true; self.visible = true;
self.focus_pending = true; self.focus_pending = true;
self.current_hint = random_hint(self.current_hint); self.current_hint = random_hint(self.current_hint);
@ -259,13 +256,6 @@ impl QuickAdd {
ctx.send_viewport_cmd(egui::ViewportCommand::InnerSize(egui::vec2(WIN_W, BASE_H))); ctx.send_viewport_cmd(egui::ViewportCommand::InnerSize(egui::vec2(WIN_W, BASE_H)));
self.win_h_applied = BASE_H; self.win_h_applied = BASE_H;
} }
// Hand keyboard focus back to the app underneath us. winit's
// `Visible(false)` alone leaves *us* the active application, so focus
// never returns and the borderless always-on-top overlay can keep eating
// clicks where it used to sit. `NSApplication.hide:` orders our windows
// fully out and activates the next app in line — exactly the one the user
// was in (no-op off macOS).
app_yield_focus();
} }
/// Optimistic submit: hide now, create in the background. /// Optimistic submit: hide now, create in the background.
@ -606,39 +596,6 @@ impl QuickAdd {
} }
} }
/// Hide the popover at the *application* level so macOS hands keyboard focus
/// back to the previously active app. `NSApplication.hide:` orders all our
/// windows out and activates the next app in line — the one the user was in —
/// which a plain winit `Visible(false)` does not do. No-op off macOS.
#[cfg(target_os = "macos")]
fn app_yield_focus() {
use objc2::MainThreadMarker;
use objc2_app_kit::NSApplication;
// eframe's `update` runs on the main thread, so this marker is always Some.
if let Some(mtm) = MainThreadMarker::new() {
NSApplication::sharedApplication(mtm).hide(None);
}
}
#[cfg(not(target_os = "macos"))]
fn app_yield_focus() {}
/// Undo [`app_yield_focus`]: clear the app-level hidden flag before re-showing,
/// so the window the viewport `Focus` command then makes key actually appears.
/// (`unhide:` also re-activates us; the per-window `Focus`/`Visible` viewport
/// commands do the rest.) No-op off macOS.
#[cfg(target_os = "macos")]
fn app_take_focus() {
use objc2::MainThreadMarker;
use objc2_app_kit::NSApplication;
if let Some(mtm) = MainThreadMarker::new() {
NSApplication::sharedApplication(mtm).unhide(None);
}
}
#[cfg(not(target_os = "macos"))]
fn app_take_focus() {}
/// The current parent process id, for orphan detection. `None` off macOS (where /// The current parent process id, for orphan detection. `None` off macOS (where
/// hephd does not supervise a helper — there is no Aqua session to inherit). /// hephd does not supervise a helper — there is no Aqua session to inherit).
fn current_parent_pid() -> Option<i32> { fn current_parent_pid() -> Option<i32> {

View file

@ -570,9 +570,7 @@ fn sync_indicator(sync: &SyncStatus, now: i64) -> Vec<Span<'static>> {
let health = sync.health.clone().unwrap_or_default(); let health = sync.health.clone().unwrap_or_default();
let mut spans = vec![if health.auth_failure { let mut spans = vec![if health.auth_failure {
// Point at the recovery command — `heph auth status` prints the exact Span::styled("⚠ auth", red)
// `heph auth login …` to run (the full command is too long for the bar).
Span::styled("⚠ auth · heph auth status", red)
} else if let Some(ts) = health.last_success_ms { } else if let Some(ts) = health.last_success_ms {
Span::styled(format!("{}", fmt_age(now, ts)), dim) Span::styled(format!("{}", fmt_age(now, ts)), dim)
} else if health.last_error.is_some() { } else if health.last_error.is_some() {
@ -641,7 +639,7 @@ mod tests {
}, },
0, 0,
); );
assert_eq!(render(&auth, NOW), "⚠ auth · heph auth status"); assert_eq!(render(&auth, NOW), "⚠ auth");
// Errored with no prior success → offline. // Errored with no prior success → offline.
let offline = spoke( let offline = spoke(

View file

@ -344,7 +344,7 @@ enum ConflictAction {
}, },
} }
#[derive(Subcommand, Debug, Clone)] #[derive(Subcommand, Debug)]
enum AuthAction { enum AuthAction {
/// Log in via the device-code flow; caches the bearer token for hub sync. /// Log in via the device-code flow; caches the bearer token for hub sync.
Login { Login {
@ -367,9 +367,6 @@ enum AuthAction {
#[arg(long)] #[arg(long)]
hub_url: String, hub_url: String,
}, },
/// Show this spoke's auth health and, if re-auth is needed, the exact
/// `heph auth login` command to run. Queries the daemon.
Status,
} }
/// Run the device-code flow (or clear a token) — no daemon needed. /// Run the device-code flow (or clear a token) — no daemon needed.
@ -399,63 +396,10 @@ fn run_auth(action: AuthAction) -> Result<()> {
KeyringTokenStore::new(hub_url.as_str()).clear()?; KeyringTokenStore::new(hub_url.as_str()).clear()?;
println!("Logged out of {hub_url}."); println!("Logged out of {hub_url}.");
} }
AuthAction::Status => unreachable!("auth status is handled via the daemon"),
} }
Ok(()) Ok(())
} }
/// Render `heph auth status` from a `sync.status` RPC response: hub/issuer/client
/// id, whether auth is healthy or needs re-login, and — when it does — the exact
/// command to run (built daemon-side, keyed under the right hub URL).
fn print_auth_status(status: &Value) {
let Some(hub) = status.get("hub_url").and_then(Value::as_str) else {
println!("This instance is standalone (no hub configured); auth does not apply.");
return;
};
let auth = status.get("auth");
let issuer = auth.and_then(|a| a.get("issuer")).and_then(Value::as_str);
let client_id = auth
.and_then(|a| a.get("client_id"))
.and_then(Value::as_str);
let health = status.get("health");
let auth_failure = health
.and_then(|h| h.get("auth_failure"))
.and_then(Value::as_bool)
.unwrap_or(false);
let last_error = health
.and_then(|h| h.get("last_error"))
.and_then(Value::as_str);
let last_success = health
.and_then(|h| h.get("last_success_ms"))
.and_then(Value::as_i64);
println!("hub : {hub}");
if let Some(iss) = issuer {
println!("issuer : {iss}");
}
if let Some(cid) = client_id {
println!("client id : {cid}");
}
println!(
"auth : {}",
if auth_failure {
"FAILED — re-authentication required"
} else if last_success.is_some() {
"ok"
} else {
"unknown (no successful sync yet)"
}
);
if let Some(err) = last_error {
println!("last error : {err}");
}
if auth_failure {
if let Some(cmd) = status.get("reauth_command").and_then(Value::as_str) {
println!("\nTo re-authenticate, run:\n {cmd}");
}
}
}
fn main() -> Result<()> { fn main() -> Result<()> {
let cli = Cli::parse(); let cli = Cli::parse();
@ -463,13 +407,9 @@ fn main() -> Result<()> {
if let Command::Daemon { action } = &cli.command { if let Command::Daemon { action } = &cli.command {
return service::run(action); return service::run(action);
} }
// `auth login`/`logout` run locally (device-code flow + keyring); they need // `auth` runs locally (device-code flow + keyring); it needs no daemon.
// no daemon. `auth status` reads live sync health, so it falls through to the if let Command::Auth { action } = cli.command {
// connected path below. return run_auth(action);
if let Command::Auth { action } = &cli.command {
if !matches!(action, AuthAction::Status) {
return run_auth(action.clone());
}
} }
let socket = cli.socket.unwrap_or_else(default_socket_path); let socket = cli.socket.unwrap_or_else(default_socket_path);
@ -850,13 +790,7 @@ fn main() -> Result<()> {
let n = result.as_u64().unwrap_or(0); let n = result.as_u64().unwrap_or(0);
println!("Rewrote legacy [[Name]] links to [[id]] in {n} node(s)."); println!("Rewrote legacy [[Name]] links to [[id]] in {n} node(s).");
} }
Command::Auth { Command::Auth { .. } => unreachable!("auth is handled before connecting"),
action: AuthAction::Status,
} => {
let result = client.call("sync.status", json!({}))?;
print_auth_status(&result);
}
Command::Auth { .. } => unreachable!("auth login/logout handled before connecting"),
Command::Daemon { .. } => unreachable!("daemon is handled before connecting"), Command::Daemon { .. } => unreachable!("daemon is handled before connecting"),
} }
Ok(()) Ok(())

View file

@ -13,7 +13,6 @@
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
use std::process::Command; use std::process::Command;
use std::time::{Duration, Instant};
use anyhow::{bail, Context, Result}; use anyhow::{bail, Context, Result};
use clap::{Args, Subcommand}; use clap::{Args, Subcommand};
@ -495,51 +494,6 @@ fn launchd_loaded(domain_target: &str) -> bool {
.unwrap_or(false) .unwrap_or(false)
} }
/// Block until `target` is no longer loaded, up to `timeout`. `launchctl bootout`
/// is asynchronous in effect — it requests teardown and returns, but launchd may
/// still be killing/reaping the job and removing its label from the domain.
/// Bootstrapping while the label lingers fails with a generic `5: Input/output
/// error`, so we wait for the label to actually disappear before re-bootstrapping.
fn wait_until_unloaded(target: &str, timeout: Duration) {
let start = Instant::now();
while launchd_loaded(target) {
if start.elapsed() >= timeout {
break; // fall through; bootstrap's own retry covers the residual window
}
std::thread::sleep(Duration::from_millis(100));
}
}
/// Bootstrap the service, retrying briefly. Even once the old instance is gone,
/// launchd can momentarily return EIO while the domain settles, so a couple of
/// short retries make `start`/`restart` reliable instead of intermittently failing.
fn launchd_bootstrap(domain: &str, plist: &str) -> Result<()> {
let mut last = String::new();
for attempt in 0..5 {
if attempt > 0 {
std::thread::sleep(Duration::from_millis(200));
}
let (ok, err) = run_cmd("launchctl", &["bootstrap", domain, plist])?;
if ok {
return Ok(());
}
last = err;
}
bail!("launchctl bootstrap failed: {}", last.trim());
}
/// Restart an already-loaded job in place (kills it, then launchd's KeepAlive —
/// `-k` forces the kill). This restarts the *loaded* job definition, so it does
/// not pick up an edited plist — callers use it only when the on-disk plist is
/// unchanged, where it sidesteps the bootout→bootstrap race entirely.
fn launchd_kickstart(target: &str) -> Result<()> {
let (ok, err) = run_cmd("launchctl", &["kickstart", "-k", target])?;
if !ok {
bail!("launchctl kickstart failed: {}", err.trim());
}
Ok(())
}
fn launchd(action: &DaemonAction, p: &Paths) -> Result<()> { fn launchd(action: &DaemonAction, p: &Paths) -> Result<()> {
let plist = launchd_plist_path()?; let plist = launchd_plist_path()?;
let uid = uid()?; let uid = uid()?;
@ -558,7 +512,10 @@ fn launchd(action: &DaemonAction, p: &Paths) -> Result<()> {
if launchd_loaded(&target) { if launchd_loaded(&target) {
println!("heph daemon already running ({LABEL})."); println!("heph daemon already running ({LABEL}).");
} else { } else {
launchd_bootstrap(&domain, &plist_str(&plist)?)?; let (ok, err) = run_cmd("launchctl", &["bootstrap", &domain, &plist_str(&plist)?])?;
if !ok {
bail!("launchctl bootstrap failed: {}", err.trim());
}
println!("heph daemon started ({LABEL})."); println!("heph daemon started ({LABEL}).");
} }
} }
@ -570,24 +527,14 @@ fn launchd(action: &DaemonAction, p: &Paths) -> Result<()> {
let cfg = args let cfg = args
.to_config() .to_config()
.fill_from(existing_config(&plist, &Manager::Launchd)); .fill_from(existing_config(&plist, &Manager::Launchd));
let changed = write_if_changed( write_if_changed(
&plist, &plist,
&launchd_plist(&p.hephd, &p.db, &p.socket, &p.log, &cfg), &launchd_plist(&p.hephd, &p.db, &p.socket, &p.log, &cfg),
)?; )?;
if !launchd_loaded(&target) { let _ = run_cmd("launchctl", &["bootout", &target])?;
// Not currently loaded — nothing to tear down, just bring it up. let (ok, err) = run_cmd("launchctl", &["bootstrap", &domain, &plist_str(&plist)?])?;
launchd_bootstrap(&domain, &plist_str(&plist)?)?; if !ok {
} else if changed { bail!("launchctl bootstrap failed: {}", err.trim());
// The plist changed, so launchd must re-read it: a full reload is
// required. bootout is async, so wait for the label to clear
// before bootstrapping (and bootstrap retries the residual EIO).
let _ = run_cmd("launchctl", &["bootout", &target])?;
wait_until_unloaded(&target, Duration::from_secs(5));
launchd_bootstrap(&domain, &plist_str(&plist)?)?;
} else {
// Same definition (e.g. binary upgraded in place) — restart the
// loaded job atomically, sidestepping the bootout→bootstrap race.
launchd_kickstart(&target)?;
} }
println!("heph daemon restarted ({LABEL})."); println!("heph daemon restarted ({LABEL}).");
} }

View file

@ -38,45 +38,9 @@ pub enum AuthError {
/// The token was present but failed validation. /// The token was present but failed validation.
#[error("invalid token: {0}")] #[error("invalid token: {0}")]
Invalid(String), Invalid(String),
/// The identity provider could not be reached at all (DNS, TLS, connection /// The identity provider could not be reached to fetch keys.
/// refused, timeout) — a transport failure, distinct from a rejection.
#[error("identity provider unreachable: {0}")] #[error("identity provider unreachable: {0}")]
Unreachable(String), Provider(String),
/// The identity provider *was* reached but returned an HTTP error response —
/// e.g. `400 invalid_grant` on a refresh, meaning the token was rejected
/// (expired/rotated/session-invalidated), not that the IdP was down. The
/// distinction matters: "unreachable" sends debugging toward the network;
/// this points at the token/authorization.
#[error("identity provider rejected the request: {0}")]
Rejected(String),
/// Some other failure in the auth path that is neither a transport failure
/// nor an HTTP rejection — a malformed/unparseable IdP response, or a local
/// credential-store (keyring) error. Kept distinct so neither is mislabeled
/// as "unreachable".
#[error("auth error: {0}")]
Other(String),
}
impl AuthError {
/// Build a [`AuthError::Rejected`] from an HTTP status and the OAuth error
/// body (RFC 6749 §5.2), e.g. `HTTP 400 (invalid_grant): Token is expired`.
pub fn rejected(status: u16, error: Option<&str>, description: Option<&str>) -> AuthError {
let mut msg = format!("HTTP {status}");
if let Some(e) = error.filter(|e| !e.is_empty()) {
msg.push_str(&format!(" ({e})"));
}
if let Some(d) = description.filter(|d| !d.is_empty()) {
msg.push_str(&format!(": {d}"));
}
AuthError::Rejected(msg)
}
/// Whether this is an authorization-level rejection (the IdP refused the
/// grant) rather than a transport failure — i.e. re-authentication is the
/// likely fix, not network troubleshooting.
pub fn is_rejection(&self) -> bool {
matches!(self, AuthError::Rejected(_))
}
} }
/// Verifies a bearer token and returns its [`Claims`]. A trait so the hub can be /// Verifies a bearer token and returns its [`Claims`]. A trait so the hub can be
@ -128,13 +92,16 @@ impl OidcVerifier {
.http .http
.get(url) .get(url)
.call() .call()
.map_err(|e| AuthError::Unreachable(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
if !resp.status().is_success() { if !resp.status().is_success() {
return Err(AuthError::rejected(resp.status().as_u16(), None, None)); return Err(AuthError::Provider(format!(
"{url} returned {}",
resp.status()
)));
} }
resp.body_mut() resp.body_mut()
.read_json() .read_json()
.map_err(|e| AuthError::Unreachable(e.to_string())) .map_err(|e| AuthError::Provider(e.to_string()))
} }
/// Resolve the JWKS URI from the provider's discovery document. /// Resolve the JWKS URI from the provider's discovery document.
@ -202,38 +169,3 @@ impl TokenVerifier for OidcVerifier {
Some((&self.issuer, &self.audience)) Some((&self.issuer, &self.audience))
} }
} }
#[cfg(test)]
mod tests {
use super::AuthError;
#[test]
fn rejected_formats_status_error_and_description() {
let e = AuthError::rejected(400, Some("invalid_grant"), Some("Token is not active"));
assert!(e.is_rejection());
assert_eq!(
e.to_string(),
"identity provider rejected the request: HTTP 400 (invalid_grant): Token is not active"
);
}
#[test]
fn rejected_omits_absent_or_empty_oauth_fields() {
// No OAuth body (e.g. a bare 503) → just the status.
assert_eq!(
AuthError::rejected(503, None, None).to_string(),
"identity provider rejected the request: HTTP 503"
);
// Empty strings are treated as absent, not rendered as "()" / ": ".
assert_eq!(
AuthError::rejected(400, Some(""), Some("")).to_string(),
"identity provider rejected the request: HTTP 400"
);
}
#[test]
fn unreachable_is_not_a_rejection() {
assert!(!AuthError::Unreachable("connection refused".into()).is_rejection());
assert!(!AuthError::Other("keyring locked".into()).is_rejection());
}
}

View file

@ -2,145 +2,59 @@
//! //!
//! Used by the `heph` CLI and by tests. Surfaces never touch SQLite directly //! Used by the `heph` CLI and by tests. Surfaces never touch SQLite directly
//! (tech-spec §3) — they go through the daemon socket, which this wraps. //! (tech-spec §3) — they go through the daemon socket, which this wraps.
//!
//! The connection self-heals across daemon restarts (opt-in self-update, `heph
//! daemon restart`): a [`call`](Client::call) that finds the socket dropped
//! reconnects. It only auto-retries when the request provably never reached the
//! daemon (a write-side failure); a reply lost *after* sending is surfaced
//! rather than retried, so a mutation is never silently double-applied.
use std::io::{BufRead, BufReader, Write}; use std::io::{BufRead, BufReader, Write};
use std::os::unix::net::UnixStream; use std::os::unix::net::UnixStream;
use std::path::{Path, PathBuf}; use std::path::Path;
use anyhow::{anyhow, Context, Result}; use anyhow::{bail, Context, Result};
use serde_json::{json, Value}; use serde_json::{json, Value};
use crate::rpc::Response; use crate::rpc::Response;
/// A connected client. One request/response per [`call`](Client::call). /// A connected client. One request/response per [`call`](Client::call).
pub struct Client { pub struct Client {
socket_path: PathBuf,
reader: BufReader<UnixStream>, reader: BufReader<UnixStream>,
writer: UnixStream, writer: UnixStream,
next_id: u64, next_id: u64,
} }
/// How a single request/response exchange failed — drives the retry decision.
enum ExchangeError {
/// The request could not be written (broken pipe, reset): it never reached
/// the daemon, so retrying on a fresh connection is safe.
Send(anyhow::Error),
/// The request was sent but no reply came back (the daemon closed mid-flight,
/// e.g. it restarted): it may or may not have applied — do not retry.
Recv(anyhow::Error),
/// A well-formed RPC-level error (or an unparseable reply): the connection is
/// fine; nothing to reconnect.
Rpc(anyhow::Error),
}
impl ExchangeError {
fn into_inner(self) -> anyhow::Error {
match self {
ExchangeError::Send(e) | ExchangeError::Recv(e) | ExchangeError::Rpc(e) => e,
}
}
}
impl Client { impl Client {
/// Connect to a daemon listening at `socket_path`. /// Connect to a daemon listening at `socket_path`.
pub fn connect(socket_path: &Path) -> Result<Client> { pub fn connect(socket_path: &Path) -> Result<Client> {
let (reader, writer) = Self::open(socket_path)?; let stream = UnixStream::connect(socket_path)
.with_context(|| format!("connecting to hephd at {}", socket_path.display()))?;
let reader = BufReader::new(stream.try_clone()?);
Ok(Client { Ok(Client {
socket_path: socket_path.to_path_buf(),
reader, reader,
writer, writer: stream,
next_id: 1, next_id: 1,
}) })
} }
/// Open a fresh reader/writer pair on the socket.
fn open(socket_path: &Path) -> Result<(BufReader<UnixStream>, UnixStream)> {
let stream = UnixStream::connect(socket_path)
.with_context(|| format!("connecting to hephd at {}", socket_path.display()))?;
let reader = BufReader::new(stream.try_clone()?);
Ok((reader, stream))
}
/// Re-establish the connection (after the daemon restarted and dropped it).
fn reconnect(&mut self) -> Result<()> {
let (reader, writer) = Self::open(&self.socket_path)?;
self.reader = reader;
self.writer = writer;
Ok(())
}
/// Call `method` with `params`, returning the `result` value (or an error /// Call `method` with `params`, returning the `result` value (or an error
/// carrying the RPC error's code and message). /// carrying the RPC error's code and message).
///
/// If the daemon has restarted and dropped the socket, this reconnects: it
/// retries transparently when the request never went out, and otherwise
/// reconnects for the next call while surfacing an error for this one (so a
/// mutation whose reply was lost is not silently re-applied).
pub fn call(&mut self, method: &str, params: Value) -> Result<Value> { pub fn call(&mut self, method: &str, params: Value) -> Result<Value> {
let id = self.next_id; let id = self.next_id;
self.next_id += 1; self.next_id += 1;
let mut line = serde_json::to_string(&json!({ let mut line = serde_json::to_string(&json!({
"id": id, "id": id,
"method": method, "method": method,
"params": params, "params": params,
}))?; }))?;
line.push('\n'); line.push('\n');
self.writer.write_all(line.as_bytes())?;
match self.exchange(&line) { self.writer.flush()?;
Ok(v) => Ok(v),
Err(ExchangeError::Rpc(e)) => Err(e),
Err(ExchangeError::Send(_)) => {
// The request never reached the daemon — reconnect and retry once.
self.reconnect()
.context("hephd connection lost and reconnect failed")?;
self.exchange(&line)
.map_err(ExchangeError::into_inner)
.with_context(|| format!("retrying `{method}` after reconnect"))
}
Err(ExchangeError::Recv(e)) => {
// Sent but no reply: the daemon likely restarted mid-request. Don't
// retry (a mutation may have applied); reconnect for next time and
// surface this one.
let _ = self.reconnect();
Err(e).context(
"hephd closed the connection mid-request (it likely restarted); \
reconnected re-run the action if it didn't take effect",
)
}
}
}
/// One request/response over the current connection, classifying failures.
fn exchange(&mut self, line: &str) -> std::result::Result<Value, ExchangeError> {
self.writer
.write_all(line.as_bytes())
.map_err(|e| ExchangeError::Send(e.into()))?;
self.writer
.flush()
.map_err(|e| ExchangeError::Send(e.into()))?;
let mut response_line = String::new(); let mut response_line = String::new();
let read = self let read = self.reader.read_line(&mut response_line)?;
.reader
.read_line(&mut response_line)
.map_err(|e| ExchangeError::Recv(e.into()))?;
if read == 0 { if read == 0 {
return Err(ExchangeError::Recv(anyhow!("hephd closed the connection"))); bail!("hephd closed the connection");
} }
let response: Response = let response: Response = serde_json::from_str(&response_line)?;
serde_json::from_str(&response_line).map_err(|e| ExchangeError::Rpc(e.into()))?;
if let Some(err) = response.error { if let Some(err) = response.error {
return Err(ExchangeError::Rpc(anyhow!( bail!("rpc error {}: {}", err.code, err.message);
"rpc error {}: {}",
err.code,
err.message
)));
} }
Ok(response.result.unwrap_or(Value::Null)) Ok(response.result.unwrap_or(Value::Null))
} }

View file

@ -109,7 +109,7 @@ impl KeyringTokenStore {
} }
}); });
keyring_core::Entry::new(&self.service, &self.account) keyring_core::Entry::new(&self.service, &self.account)
.map_err(|e| AuthError::Other(e.to_string())) .map_err(|e| AuthError::Provider(e.to_string()))
} }
} }
@ -119,16 +119,16 @@ impl TokenStore for KeyringTokenStore {
serde_json::from_str(&secret).ok() serde_json::from_str(&secret).ok()
} }
fn save(&self, token: &StoredToken) -> Result<(), AuthError> { fn save(&self, token: &StoredToken) -> Result<(), AuthError> {
let json = serde_json::to_string(token).map_err(|e| AuthError::Other(e.to_string()))?; let json = serde_json::to_string(token).map_err(|e| AuthError::Provider(e.to_string()))?;
self.entry()? self.entry()?
.set_password(&json) .set_password(&json)
.map_err(|e| AuthError::Other(e.to_string())) .map_err(|e| AuthError::Provider(e.to_string()))
} }
fn clear(&self) -> Result<(), AuthError> { fn clear(&self) -> Result<(), AuthError> {
match self.entry()?.delete_credential() { match self.entry()?.delete_credential() {
Ok(()) => Ok(()), Ok(()) => Ok(()),
Err(keyring_core::Error::NoEntry) => Ok(()), Err(keyring_core::Error::NoEntry) => Ok(()),
Err(e) => Err(AuthError::Other(e.to_string())), Err(e) => Err(AuthError::Provider(e.to_string())),
} }
} }
} }
@ -187,9 +187,6 @@ impl TokenResponse {
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
struct TokenErrorBody { struct TokenErrorBody {
error: String, error: String,
/// Human-readable detail the provider may include (RFC 6749 §5.2).
#[serde(default)]
error_description: Option<String>,
} }
/// Drives the OAuth 2.0 device-code flow against one provider. /// Drives the OAuth 2.0 device-code flow against one provider.
@ -211,14 +208,17 @@ impl DeviceFlow {
let mut resp = http let mut resp = http
.get(&url) .get(&url)
.call() .call()
.map_err(|e| AuthError::Unreachable(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
if !resp.status().is_success() { if !resp.status().is_success() {
return Err(AuthError::rejected(resp.status().as_u16(), None, None)); return Err(AuthError::Provider(format!(
"discovery returned {}",
resp.status()
)));
} }
let doc: DiscoveryDoc = resp let doc: DiscoveryDoc = resp
.body_mut() .body_mut()
.read_json() .read_json()
.map_err(|e| AuthError::Other(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
Ok(DeviceFlow { Ok(DeviceFlow {
client_id: client_id.to_string(), client_id: client_id.to_string(),
http, http,
@ -233,13 +233,16 @@ impl DeviceFlow {
.http .http
.post(&self.device_authorization_endpoint) .post(&self.device_authorization_endpoint)
.send_form([("client_id", self.client_id.as_str()), ("scope", scope)]) .send_form([("client_id", self.client_id.as_str()), ("scope", scope)])
.map_err(|e| AuthError::Unreachable(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
if !resp.status().is_success() { if !resp.status().is_success() {
return Err(AuthError::rejected(resp.status().as_u16(), None, None)); return Err(AuthError::Provider(format!(
"device authorization returned {}",
resp.status()
)));
} }
resp.body_mut() resp.body_mut()
.read_json() .read_json()
.map_err(|e| AuthError::Other(e.to_string())) .map_err(|e| AuthError::Provider(e.to_string()))
} }
/// Poll the token endpoint until the user authorizes, the code expires, or /// Poll the token endpoint until the user authorizes, the code expires, or
@ -264,13 +267,13 @@ impl DeviceFlow {
("device_code", auth.device_code.as_str()), ("device_code", auth.device_code.as_str()),
("client_id", self.client_id.as_str()), ("client_id", self.client_id.as_str()),
]) ])
.map_err(|e| AuthError::Unreachable(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
if response.status().is_success() { if response.status().is_success() {
let token: TokenResponse = response let token: TokenResponse = response
.body_mut() .body_mut()
.read_json() .read_json()
.map_err(|e| AuthError::Other(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
return Ok(token.into_stored()); return Ok(token.into_stored());
} }
@ -278,7 +281,7 @@ impl DeviceFlow {
let body: TokenErrorBody = response let body: TokenErrorBody = response
.body_mut() .body_mut()
.read_json() .read_json()
.map_err(|e| AuthError::Other(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
match body.error.as_str() { match body.error.as_str() {
"authorization_pending" => {} "authorization_pending" => {}
"slow_down" => interval += 5, "slow_down" => interval += 5,
@ -298,24 +301,17 @@ impl DeviceFlow {
("refresh_token", refresh_token), ("refresh_token", refresh_token),
("client_id", self.client_id.as_str()), ("client_id", self.client_id.as_str()),
]) ])
.map_err(|e| AuthError::Unreachable(e.to_string()))?; .map_err(|e| AuthError::Provider(e.to_string()))?;
if !response.status().is_success() { if !response.status().is_success() {
// The IdP was reached and refused the grant (typically a `400 return Err(AuthError::Provider(format!(
// invalid_grant` once the refresh token is expired/rotated). Report "token refresh returned {}",
// it as a *rejection* with the OAuth error body — not "unreachable", response.status()
// which would misdirect debugging toward the network. )));
let status = response.status().as_u16();
let body = response.body_mut().read_json::<TokenErrorBody>().ok();
return Err(AuthError::rejected(
status,
body.as_ref().map(|b| b.error.as_str()),
body.as_ref().and_then(|b| b.error_description.as_deref()),
));
} }
let mut token: StoredToken = response let mut token: StoredToken = response
.body_mut() .body_mut()
.read_json::<TokenResponse>() .read_json::<TokenResponse>()
.map_err(|e| AuthError::Other(e.to_string()))? .map_err(|e| AuthError::Provider(e.to_string()))?
.into_stored(); .into_stored();
// Providers may omit the refresh token on refresh — keep the old one. // Providers may omit the refresh token on refresh — keep the old one.
if token.refresh_token.is_none() { if token.refresh_token.is_none() {

View file

@ -20,7 +20,6 @@ use tokio::net::{UnixListener, UnixStream};
use heph_core::Store; use heph_core::Store;
use crate::auth::AuthError;
use crate::oauth::{self, TokenStore}; use crate::oauth::{self, TokenStore};
use crate::rpc::{self, Request, Response, RpcError, INTERNAL_ERROR, PARSE_ERROR}; use crate::rpc::{self, Request, Response, RpcError, INTERNAL_ERROR, PARSE_ERROR};
use crate::selfupdate::{self, SelfUpdateConfig}; use crate::selfupdate::{self, SelfUpdateConfig};
@ -81,25 +80,10 @@ fn is_auth_error(e: &anyhow::Error) -> bool {
.is_some_and(|s| s == reqwest::StatusCode::UNAUTHORIZED) .is_some_and(|s| s == reqwest::StatusCode::UNAUTHORIZED)
} }
/// The exact `heph auth login …` command that re-authenticates this spoke, built /// Fold one exchange outcome into the shared [`SyncHealth`].
/// from the hub URL + issuer + client id the daemon is configured with — so the fn record_sync_outcome(health: &Arc<Mutex<SyncHealth>>, result: &Result<sync::SyncReport>) {
/// surfaced error tells the user *what to run*, not just that auth failed.
/// `None` for an unauthenticated / standalone instance. The hub-URL string must
/// match what the credential store is keyed under, which is exactly `hub_url`.
fn reauth_command(hub_url: Option<&str>, auth: Option<&SpokeAuth>) -> Option<String> {
let (hub, auth) = (hub_url?, auth?);
Some(format!(
"heph auth login --hub-url {hub} --issuer {} --client-id {}",
auth.issuer, auth.client_id
))
}
/// Fold one exchange outcome into the shared [`SyncHealth`]. On an auth failure
/// (a 401 from the hub) the recorded error carries the actionable re-login
/// command, so `heph sync --status` / `heph auth status` / the TUI show the fix.
fn record_sync_outcome(ctx: &Ctx, result: &Result<sync::SyncReport>) {
let now = now_ms(); let now = now_ms();
let mut h = ctx.sync_health.lock().expect("sync_health mutex poisoned"); let mut h = health.lock().expect("sync_health mutex poisoned");
h.last_attempt_ms = Some(now); h.last_attempt_ms = Some(now);
match result { match result {
Ok(_) => { Ok(_) => {
@ -108,67 +92,28 @@ fn record_sync_outcome(ctx: &Ctx, result: &Result<sync::SyncReport>) {
h.auth_failure = false; h.auth_failure = false;
} }
Err(e) => { Err(e) => {
let auth_failure = is_auth_error(e); h.auth_failure = is_auth_error(e);
h.auth_failure = auth_failure; h.last_error = Some(e.to_string());
h.last_error = Some(annotate_reauth(
e.to_string(),
auth_failure,
ctx.hub_url.as_deref(),
ctx.auth.as_ref(),
));
} }
} }
} }
/// Record a failure to obtain a bearer token (the refresh step, before any hub
/// request). A *rejection* (the IdP refused the refresh) is an auth failure and
/// gets the re-login hint; a transport failure stays a transient error. Surfacing
/// this here means `last_error` reflects the real cause (e.g. `invalid_grant`)
/// instead of only the downstream 401 on `/sync/pull`.
fn record_bearer_failure(ctx: &Ctx, err: &AuthError) {
let now = now_ms();
let auth_failure = err.is_rejection();
let mut h = ctx.sync_health.lock().expect("sync_health mutex poisoned");
h.last_attempt_ms = Some(now);
h.auth_failure = auth_failure;
h.last_error = Some(annotate_reauth(
format!("could not obtain bearer token: {err}"),
auth_failure,
ctx.hub_url.as_deref(),
ctx.auth.as_ref(),
));
}
/// Append the actionable re-login command to `msg` when this is an auth failure
/// and the spoke has auth configured.
fn annotate_reauth(
msg: String,
auth_failure: bool,
hub_url: Option<&str>,
auth: Option<&SpokeAuth>,
) -> String {
match reauth_command(hub_url, auth) {
Some(cmd) if auth_failure => format!("{msg} — re-authenticate: {cmd}"),
_ => msg,
}
}
impl Ctx { impl Ctx {
/// The current bearer token for hub sync (refreshing if expired). `Ok(None)` /// The current bearer token for hub sync (refreshing if expired), or `None`
/// means this spoke has no auth configured / no token stored (it syncs /// if this spoke has no auth configured / no usable token.
/// unauthenticated); `Err` means token acquisition genuinely failed (the async fn bearer(&self) -> Option<String> {
/// caller records it and skips the attempt rather than 401ing the hub). let auth = self.auth.clone()?;
async fn bearer(&self) -> Result<Option<String>, AuthError> { let result = tokio::task::spawn_blocking(move || {
let Some(auth) = self.auth.clone() else {
return Ok(None);
};
match tokio::task::spawn_blocking(move || {
oauth::current_bearer(auth.store.as_ref(), &auth.issuer, &auth.client_id) oauth::current_bearer(auth.store.as_ref(), &auth.issuer, &auth.client_id)
}) })
.await .await;
{ match result {
Ok(res) => res, Ok(Ok(token)) => token,
Err(_join) => Ok(None), // the blocking task panicked; treat as no token Ok(Err(e)) => {
tracing::warn!("could not obtain bearer token: {e}");
None
}
Err(_) => None,
} }
} }
} }
@ -278,20 +223,10 @@ impl Daemon {
let mut tick = tokio::time::interval(interval); let mut tick = tokio::time::interval(interval);
loop { loop {
tick.tick().await; tick.tick().await;
let bearer = match ctx.bearer().await { let bearer = ctx.bearer().await;
Ok(b) => b,
Err(e) => {
// Couldn't get a token — record the real cause (e.g. a
// rejected refresh) and skip; sending an unauthenticated
// request would only 401 and mask it.
record_bearer_failure(&ctx, &e);
tracing::warn!("background sync: could not obtain bearer token: {e}");
continue;
}
};
let result = let result =
sync::sync_once(ctx.store.clone(), &hub, &ctx.http, bearer.as_deref()).await; sync::sync_once(ctx.store.clone(), &hub, &ctx.http, bearer.as_deref()).await;
record_sync_outcome(&ctx, &result); record_sync_outcome(&ctx.sync_health, &result);
match result { match result {
Ok(report) => tracing::debug!(?report, "background sync"), Ok(report) => tracing::debug!(?report, "background sync"),
Err(e) => tracing::warn!("background sync failed: {e}"), Err(e) => tracing::warn!("background sync failed: {e}"),
@ -386,25 +321,9 @@ async fn sync_now(ctx: &Ctx) -> Result<Value, RpcError> {
message: "no hub_url configured; this instance is standalone".into(), message: "no hub_url configured; this instance is standalone".into(),
}); });
}; };
let bearer = match ctx.bearer().await { let bearer = ctx.bearer().await;
Ok(b) => b,
Err(e) => {
// Token acquisition failed — record the real cause (with a re-login
// hint when it's a rejection) and surface it instead of a downstream 401.
record_bearer_failure(ctx, &e);
return Err(RpcError {
code: INTERNAL_ERROR,
message: annotate_reauth(
format!("sync failed: could not obtain bearer token: {e}"),
e.is_rejection(),
ctx.hub_url.as_deref(),
ctx.auth.as_ref(),
),
});
}
};
let result = sync::sync_once(ctx.store.clone(), &hub_url, &ctx.http, bearer.as_deref()).await; let result = sync::sync_once(ctx.store.clone(), &hub_url, &ctx.http, bearer.as_deref()).await;
record_sync_outcome(ctx, &result); record_sync_outcome(&ctx.sync_health, &result);
match result { match result {
Ok(report) => Ok(json!(report)), Ok(report) => Ok(json!(report)),
Err(e) => Err(RpcError { Err(e) => Err(RpcError {
@ -455,22 +374,10 @@ async fn sync_status(ctx: &Ctx) -> Result<Value, RpcError> {
.expect("sync_health mutex poisoned") .expect("sync_health mutex poisoned")
.clone(); .clone();
// Non-secret OIDC params (issuer/client-id) + the exact re-login command, so
// `heph auth status` can show the fix without reconstructing it client-side
// (and keyed under the right hub URL — see the per-URL token-keying gotcha).
let auth = ctx.auth.as_ref().map(|a| {
json!({
"issuer": a.issuer,
"client_id": a.client_id,
})
});
Ok(json!({ Ok(json!({
"hub_url": hub_url, "hub_url": hub_url,
"cursors": cursors, "cursors": cursors,
"conflicts": conflicts, "conflicts": conflicts,
"health": health, "health": health,
"auth": auth,
"reauth_command": reauth_command(Some(&hub_url), ctx.auth.as_ref()),
})) }))
} }

View file

@ -261,14 +261,8 @@ async fn require_auth(
.await .await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)? .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
.map_err(|e| match e { .map_err(|e| match e {
// The token itself is missing/bad → tell the client it's unauthorized. AuthError::Provider(_) => StatusCode::SERVICE_UNAVAILABLE,
AuthError::Missing | AuthError::Invalid(_) => StatusCode::UNAUTHORIZED, _ => StatusCode::UNAUTHORIZED,
// We couldn't reach/process the IdP to fetch verification keys — a
// transient hub-side problem, not the client's token. Ask them to
// retry rather than claiming their token is invalid.
AuthError::Unreachable(_) | AuthError::Rejected(_) | AuthError::Other(_) => {
StatusCode::SERVICE_UNAVAILABLE
}
})?; })?;
// Multi-tenancy seam: resolve the token's identity to the owner it may act // Multi-tenancy seam: resolve the token's identity to the owner it may act

View file

@ -1,96 +0,0 @@
//! [`Client`] survives the daemon dropping the socket (opt-in self-update, `heph
//! daemon restart`). A mock daemon serves exactly one request per connection
//! then closes it, forcing the client to reconnect — without auto-reconnect,
//! every call after the first would fail forever.
use std::io::{BufRead, BufReader, Write};
use std::os::unix::net::UnixListener;
use std::path::PathBuf;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;
use hephd::Client;
use serde_json::{json, Value};
/// A mock daemon that handles ONE request per connection then closes it, looping
/// to accept the next connection. `served` counts total requests answered.
fn spawn_one_shot_daemon(socket: PathBuf, served: Arc<AtomicUsize>) {
thread::spawn(move || {
let listener = UnixListener::bind(&socket).unwrap();
for conn in listener.incoming() {
let Ok(mut stream) = conn else { continue };
let mut reader = BufReader::new(stream.try_clone().unwrap());
let mut line = String::new();
if reader.read_line(&mut line).unwrap_or(0) == 0 {
continue; // client opened then went away; wait for the next one
}
let req: Value = serde_json::from_str(&line).unwrap();
let n = served.fetch_add(1, Ordering::SeqCst) + 1;
let mut out = serde_json::to_string(&json!({
"id": req["id"],
"result": { "served": n },
}))
.unwrap();
out.push('\n');
let _ = stream.write_all(out.as_bytes());
let _ = stream.flush();
// `stream` drops here → the connection closes after one request.
}
});
}
fn wait_for(socket: &std::path::Path) {
for _ in 0..400 {
if socket.exists() {
return;
}
thread::sleep(Duration::from_millis(5));
}
panic!("mock daemon socket never appeared");
}
#[test]
fn client_reconnects_after_the_daemon_drops_the_socket() {
let dir = tempfile::tempdir().unwrap();
let socket = dir.path().join("d.sock");
let served = Arc::new(AtomicUsize::new(0));
spawn_one_shot_daemon(socket.clone(), served.clone());
wait_for(&socket);
let mut c = Client::connect(&socket).unwrap();
// First call works on the initial connection.
let r1 = c.call("ping", json!({})).unwrap();
assert_eq!(r1["served"], 1);
// The daemon has now closed that connection. With reconnect, the client
// recovers within a call or two (depending on whether the dead socket fails
// on write or on read); without it, every further call would fail forever.
let mut recovered = None;
for _ in 0..2 {
if let Ok(v) = c.call("ping", json!({})) {
recovered = Some(v);
break;
}
}
let r = recovered.expect("client should reconnect after the socket was dropped");
// The recovered call was served exactly once on the new connection — no
// double-serve from a spurious retry.
assert_eq!(r["served"], 2);
assert_eq!(served.load(Ordering::SeqCst), 2);
// And it keeps working across subsequent drops.
let r3 = {
let mut got = None;
for _ in 0..2 {
if let Ok(v) = c.call("ping", json!({})) {
got = Some(v);
break;
}
}
got.expect("client should keep reconnecting")
};
assert_eq!(r3["served"], 3);
}

View file

@ -90,25 +90,11 @@ async fn token(State(s): State<IdpState>, Form(form): Form<HashMap<String, Strin
})) }))
.into_response() .into_response()
} }
Some("refresh_token") => { Some("refresh_token") => Json(json!({
// A rotated/expired refresh token is refused with `400 invalid_grant` "access_token": "access-2",
// (RFC 6749 §5.2) — the case that used to be mislabeled "unreachable". "expires_in": 3600,
if form.get("refresh_token").map(String::as_str) == Some("refresh-expired") { }))
return ( .into_response(),
StatusCode::BAD_REQUEST,
Json(json!({
"error": "invalid_grant",
"error_description": "Token is not active",
})),
)
.into_response();
}
Json(json!({
"access_token": "access-2",
"expires_in": 3600,
}))
.into_response()
}
_ => ( _ => (
StatusCode::BAD_REQUEST, StatusCode::BAD_REQUEST,
Json(json!({ "error": "unsupported_grant_type" })), Json(json!({ "error": "unsupported_grant_type" })),
@ -143,48 +129,6 @@ fn refresh_keeps_the_old_refresh_token_when_omitted() {
assert_eq!(refreshed.refresh_token.as_deref(), Some("refresh-1")); assert_eq!(refreshed.refresh_token.as_deref(), Some("refresh-1"));
} }
#[test]
fn refresh_rejected_by_idp_is_a_rejection_not_unreachable() {
let issuer = start_idp();
let flow = DeviceFlow::discover(&issuer, "heph-cli").unwrap();
let err = flow.refresh("refresh-expired").unwrap_err();
// The whole point of the fix: a reachable IdP that returns 400 is a
// *rejection*, carrying the OAuth error body — not "unreachable".
assert!(err.is_rejection(), "expected a rejection, got: {err}");
let msg = err.to_string();
assert!(
msg.contains("rejected"),
"message should say rejected: {msg}"
);
assert!(
msg.contains("invalid_grant"),
"should include the OAuth error: {msg}"
);
assert!(
msg.contains("Token is not active"),
"should include error_description: {msg}"
);
assert!(
!msg.contains("unreachable"),
"must NOT claim the IdP was unreachable: {msg}"
);
}
#[test]
fn discovery_against_a_dead_idp_is_unreachable_not_a_rejection() {
use hephd::AuthError;
// Port 1 refuses the connection → a genuine transport failure.
let err = match DeviceFlow::discover("http://127.0.0.1:1/application/o/heph/", "heph-cli") {
Ok(_) => panic!("discovery should fail against a dead IdP"),
Err(e) => e,
};
assert!(
matches!(err, AuthError::Unreachable(_)),
"a connection failure must be Unreachable, got: {err}"
);
assert!(!err.is_rejection());
}
#[test] #[test]
fn memory_token_store_round_trips_and_reports_expiry() { fn memory_token_store_round_trips_and_reports_expiry() {
let store = MemoryTokenStore::default(); let store = MemoryTokenStore::default();

View file

@ -0,0 +1 @@
heph-tui's sync indicator now shows the last-sync age in seconds under a minute (`⟳ 26s`) instead of a flat `just now`, so the chip reads as a live heartbeat and a missed sync (the loop runs every 30s) shows up as the age climbing.

View file

@ -0,0 +1 @@
`heph daemon start`/`restart` can now bake the daemon's full runtime config into the managed service — `--mode`, `--hub-url`, `--http-addr`, `--oidc-issuer`/`--oidc-audience`/`--oidc-client-id`, and `--self-update-interval-secs` (previously only the bare `--self-update` bool was wired). Regenerating preserves whatever is already baked into the on-disk plist/unit, so a bare `start`/`restart` no longer silently drops spoke/hub or self-update config.

View file

@ -86,14 +86,6 @@ still the old binary until you restart it:
heph daemon restart heph daemon restart
``` ```
A restart (or an opt-in self-update) drops the daemon's unix socket out from
under any connected surface. The CLI and `heph-tui` **reconnect automatically**:
a read transparently retries on a fresh connection, and a long-running TUI
self-heals on its next tick — so a daemon restart no longer leaves the agenda
view stuck on errors. (A mutating action whose reply is lost mid-restart reports
"reconnected — re-run the action if it didn't take effect" rather than risk
applying twice.)
## Self-update (opt-in) ## Self-update (opt-in)
`hephd` can keep itself current: `heph daemon start --self-update` generates a `hephd` can keep itself current: `heph daemon start --self-update` generates a

View file

@ -130,41 +130,19 @@ spoke is visible at a glance rather than buried in the daemon log.
Make a change on `gilbert`, force a sync, and confirm it appears via the hub. Make a change on `gilbert`, force a sync, and confirm it appears via the hub.
### When sync stops authenticating
A spoke's refresh token can expire or be rotated (e.g. the IdP session lapses).
The spoke then can't refresh on its own and needs a re-login — but this is
**visible, not silent**:
- `heph-tui` shows a red `⚠ auth · heph auth status` chip in the status line.
- `heph auth status` prints the auth health and the **exact** re-login command,
pre-filled with this spoke's hub URL / issuer / client id:
```bash
heph auth status
```
- `heph sync --status`'s `last_error` names the real cause — a refresh
*rejection* (e.g. `HTTP 400 (invalid_grant)`), not a misleading "identity
provider unreachable" — and carries the same `heph auth login …` hint.
Run the printed `heph auth login …` command to restore sync.
## Current gaps (finalized by the blumeops deployment) ## Current gaps (finalized by the blumeops deployment)
The flag-level flow above works today; one enabler makes it a clean, managed The flag-level flow above works today; two enablers make it a clean, managed
deployment rather than a hand-run process — tracked in the `Hephaestus` project: deployment rather than a hand-run process — tracked in the `Hephaestus` project:
- **`heph daemon` only generates a `--mode local` service** (no `--hub-url` /
`--oidc-*`). So for now the hub and the spoke config are expressed as `hephd`
flags (run directly, or via the blumeops-managed systemd unit), not via
`heph daemon start`.
- **Path A seeding is manual** (copy the store + reset the device origin). A - **Path A seeding is manual** (copy the store + reset the device origin). A
small enabler — seed a hub from a snapshot with a fresh origin, or small enabler — seed a hub from a snapshot with a fresh origin, or
`hephd --owner-id` — would make this one step. `hephd --owner-id` — would make this one step.
> `heph daemon start`/`restart` can now bake the spoke/hub config (`--hub-url`,
> `--mode server`, `--http-addr`, `--oidc-*`) into the generated service (see
> [[run-the-daemon]]). The canonical hub on `indri` is still provisioned via the
> blumeops-managed systemd unit by deployment choice, not because `heph daemon`
> can't express it.
## Related ## Related
- [[run-the-daemon]] — manage the local daemon as an OS service - [[run-the-daemon]] — manage the local daemon as an OS service