feat(hephd,heph,heph-tui): distinguish IdP rejection from unreachable + actionable re-auth
All checks were successful
Build / validate (pull_request) Successful in 6m12s

The spoke OAuth path funneled every failure into one `AuthError::Provider`
whose Display was hardcoded "identity provider unreachable". So a reachable IdP
returning `400 invalid_grant` on a refresh was reported as "unreachable",
misdirecting incident response toward the network when the fix is re-auth. The
real refresh cause was also swallowed — `bearer()` logged it and returned None,
so sync health only ever showed the downstream 401 on /sync/pull.

Wording fix (auth.rs / oauth.rs):
- Split AuthError into Unreachable (transport), Rejected (IdP returned an HTTP
  error — carries the RFC 6749 §5.2 error/error_description), and Other
  (keyring / malformed response, previously mislabeled too).
- refresh()/discover()/start()/poll() classify transport vs status; refresh
  reads the OAuth error body on a non-2xx.
- Hub-side token verify maps IdP-infra failures → 503, token failures → 401.

Recovery UX (server.rs / heph / heph-tui):
- bearer() returns Result; the sync paths record the real acquisition failure
  (with a re-login hint when it's a rejection) instead of a masked 401.
- sync health's last_error carries the exact `heph auth login --hub-url …
  --issuer … --client-id …` command (keyed to the configured hub); sync.status
  also returns issuer/client_id + the command.
- New `heph auth status` prints auth health and the re-login command.
- heph-tui's auth chip points at it: `⚠ auth · heph auth status`.

Closes the duplicate "misleading identity provider unreachable" tasks and the
"actionable re-auth guidance" task. Also corrects a now-stale set-up-sync-hub
gap note (daemon config baking landed in the prior PR).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-06-08 14:06:08 -07:00
commit e943a940f1
10 changed files with 393 additions and 74 deletions

View file

@ -0,0 +1 @@
hephd no longer reports a rejected OAuth refresh as "identity provider unreachable". A reachable IdP that returns an HTTP error (e.g. `400 invalid_grant` once a refresh token expires/rotates) is now surfaced as a *rejection*`identity provider rejected the request: HTTP 400 (invalid_grant): …` — with the OAuth error body, distinct from a genuine transport failure. This stops the wording from misdirecting incident response toward the network when the real fix is re-authentication.

View file

@ -0,0 +1 @@
Spoke auth failures now tell you how to recover. When a refresh token is rejected or the hub returns 401, `hephd` records the real cause plus the exact `heph auth login --hub-url … --issuer … --client-id …` command (keyed to this spoke's hub) in its sync health. A new `heph auth status` prints that health and the re-login command, `heph sync --status`'s `last_error` carries it, and `heph-tui`'s status line points at it with a `⚠ auth · heph auth status` chip.

View file

@ -130,19 +130,41 @@ spoke is visible at a glance rather than buried in the daemon log.
Make a change on `gilbert`, force a sync, and confirm it appears via the hub.
### When sync stops authenticating
A spoke's refresh token can expire or be rotated (e.g. the IdP session lapses).
The spoke then can't refresh on its own and needs a re-login — but this is
**visible, not silent**:
- `heph-tui` shows a red `⚠ auth · heph auth status` chip in the status line.
- `heph auth status` prints the auth health and the **exact** re-login command,
pre-filled with this spoke's hub URL / issuer / client id:
```bash
heph auth status
```
- `heph sync --status`'s `last_error` names the real cause — a refresh
*rejection* (e.g. `HTTP 400 (invalid_grant)`), not a misleading "identity
provider unreachable" — and carries the same `heph auth login …` hint.
Run the printed `heph auth login …` command to restore sync.
## Current gaps (finalized by the blumeops deployment)
The flag-level flow above works today; two enablers make it a clean, managed
The flag-level flow above works today; one enabler makes it a clean, managed
deployment rather than a hand-run process — tracked in the `Hephaestus` project:
- **`heph daemon` only generates a `--mode local` service** (no `--hub-url` /
`--oidc-*`). So for now the hub and the spoke config are expressed as `hephd`
flags (run directly, or via the blumeops-managed systemd unit), not via
`heph daemon start`.
- **Path A seeding is manual** (copy the store + reset the device origin). A
small enabler — seed a hub from a snapshot with a fresh origin, or
`hephd --owner-id` — would make this one step.
> `heph daemon start`/`restart` can now bake the spoke/hub config (`--hub-url`,
> `--mode server`, `--http-addr`, `--oidc-*`) into the generated service (see
> [[run-the-daemon]]). The canonical hub on `indri` is still provisioned via the
> blumeops-managed systemd unit by deployment choice, not because `heph daemon`
> can't express it.
## Related
- [[run-the-daemon]] — manage the local daemon as an OS service