feat(hephd,heph,heph-tui): distinguish IdP rejection from unreachable + actionable re-auth

The spoke OAuth path funneled every failure into one `AuthError::Provider` whose Display was hardcoded "identity provider unreachable". So a reachable IdP returning `400 invalid_grant` on a refresh was reported as "unreachable", misdirecting incident response toward the network when the fix is re-auth. The real refresh cause was also swallowed — `bearer()` logged it and returned None, so sync health only ever showed the downstream 401 on /sync/pull. Wording fix (auth.rs / oauth.rs): - Split AuthError into Unreachable (transport), Rejected (IdP returned an HTTP error — carries the RFC 6749 §5.2 error/error_description), and Other (keyring / malformed response, previously mislabeled too). - refresh()/discover()/start()/poll() classify transport vs status; refresh reads the OAuth error body on a non-2xx. - Hub-side token verify maps IdP-infra failures → 503, token failures → 401. Recovery UX (server.rs / heph / heph-tui): - bearer() returns Result; the sync paths record the real acquisition failure (with a re-login hint when it's a rejection) instead of a masked 401. - sync health's last_error carries the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to the configured hub); sync.status also returns issuer/client_id + the command. - New `heph auth status` prints auth health and the re-login command. - heph-tui's auth chip points at it: `⚠ auth · heph auth status`. Closes the duplicate "misleading identity provider unreachable" tasks and the "actionable re-auth guidance" task. Also corrects a now-stale set-up-sync-hub gap note (daemon config baking landed in the prior PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 14:06:08 -07:00 · 2026-06-08 14:06:08 -07:00 · e943a940f1
commit e943a940f1
parent b82264892f
10 changed files with 393 additions and 74 deletions
--- a/docs/changelog.d/auth-error-clarity.bugfix.md
+++ b/docs/changelog.d/auth-error-clarity.bugfix.md
@ -0,0 +1 @@
+hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a *rejection* — `identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication.
--- a/docs/changelog.d/auth-error-clarity.feature.md
+++ b/docs/changelog.d/auth-error-clarity.feature.md
@ -0,0 +1 @@
+Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip.
--- a/docs/how-to/set-up-sync-hub.md
+++ b/docs/how-to/set-up-sync-hub.md
@ -130,19 +130,41 @@ spoke is visible at a glance rather than buried in the daemon log.

 Make a change on `gilbert`, force a sync, and confirm it appears via the hub.

+### When sync stops authenticating
+
+A spoke's refresh token can expire or be rotated (e.g. the IdP session lapses).
+The spoke then can't refresh on its own and needs a re-login — but this is
+**visible, not silent**:
+
+- `heph-tui` shows a red `⚠ auth · heph auth status` chip in the status line.
+- `heph auth status` prints the auth health and the **exact** re-login command,
+  pre-filled with this spoke's hub URL / issuer / client id:
+
+  ```bash
+  heph auth status
+  ```
+
+- `heph sync --status`'s `last_error` names the real cause — a refresh
+  *rejection* (e.g. `HTTP 400 (invalid_grant)`), not a misleading "identity
+  provider unreachable" — and carries the same `heph auth login …` hint.
+
+Run the printed `heph auth login …` command to restore sync.
+
 ## Current gaps (finalized by the blumeops deployment)

-The flag-level flow above works today; two enablers make it a clean, managed
+The flag-level flow above works today; one enabler makes it a clean, managed
 deployment rather than a hand-run process — tracked in the `Hephaestus` project:

- **`heph daemon` only generates a `--mode local` service** (no `--hub-url` /
-  `--oidc-*`). So for now the hub and the spoke config are expressed as `hephd`
-  flags (run directly, or via the blumeops-managed systemd unit), not via
-  `heph daemon start`.
 - **Path A seeding is manual** (copy the store + reset the device origin). A
  small enabler — seed a hub from a snapshot with a fresh origin, or
  `hephd --owner-id` — would make this one step.

+> `heph daemon start`/`restart` can now bake the spoke/hub config (`--hub-url`,
+> `--mode server`, `--http-addr`, `--oidc-*`) into the generated service (see
+> [[run-the-daemon]]). The canonical hub on `indri` is still provisioned via the
+> blumeops-managed systemd unit by deployment choice, not because `heph daemon`
+> can't express it.
+
 ## Related

 - [[run-the-daemon]] — manage the local daemon as an OS service
				`@ -0,0 +1 @@`
				hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a rejection — `identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication.
				`@ -0,0 +1 @@`
				Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip.