Auth errors: distinguish IdP rejection from unreachable + actionable re-auth recovery #14

Merged
eblume merged 1 commit from feature/auth-error-clarity into main 2026-06-08 14:10:43 -07:00
Owner

Summary

Closes the spoke→hub auth-failure cluster (observed in the 2026-06-06 offline_access incident): the daemon reported a rejected OAuth refresh as "identity provider unreachable" and gave no recovery guidance.

Root cause: every failure in the OAuth path funneled into one AuthError::Provider whose Display was hardcoded "identity provider unreachable: {0}". A reachable IdP returning 400 invalid_grant on a refresh got labeled "unreachable", sending debugging toward the network. The real refresh cause was also swallowed — bearer() logged it and returned None, so sync health only ever showed the downstream 401 on /sync/pull.

Wording fix (auth.rs / oauth.rs)

  • Split AuthError into Unreachable (transport), Rejected (IdP returned an HTTP error — carries the RFC 6749 §5.2 error/error_description), and Other (keyring / malformed response, previously mislabeled too).
  • refresh()/discover()/start()/poll() classify transport vs status; refresh reads the OAuth error body on a non-2xx.
  • Hub-side token verify maps IdP-infra failures → 503, token failures → 401.

Recovery UX (server.rs / heph / heph-tui)

  • bearer() now returns Result; the sync paths record the real acquisition failure (with a re-login hint when it's a rejection) instead of a masked 401, so last_error reflects the refresh cause (e.g. invalid_grant).
  • Sync health's last_error carries the exact heph auth login --hub-url … --issuer … --client-id … command, built daemon-side and keyed to the configured hub URL (per the per-URL token-keying gotcha). sync.status also returns issuer/client_id + the command.
  • New heph auth status prints auth health and the re-login command.
  • heph-tui's auth chip points at it: ⚠ auth · heph auth status.

Also corrects a now-stale gap note in set-up-sync-hub.md (daemon config baking landed in #12).

Testing

  • cargo test --workspace green; cargo clippy --workspace --all-targets clean; fmt + prek hooks pass
  • New tests/oauth.rs: a 400 invalid_grant refresh → Rejected with the OAuth body (not "unreachable"); a dead IdP → Unreachable
  • New auth.rs unit tests for AuthError::rejected formatting / is_rejection
  • heph-tui indicator test updated for the new chip
  • Smoke-tested heph auth status against the live daemon (renders hub + auth state; degrades gracefully on the older daemon that doesn't yet return the new fields)

Closes

  • 01KTFSHMYK… + 01KTFVNEQH… (duplicate "misleading identity provider unreachable")
  • 01KTFSHMYZ… (actionable re-auth guidance)

🤖 Generated with Claude Code

## Summary Closes the spoke→hub auth-failure cluster (observed in the 2026-06-06 `offline_access` incident): the daemon reported a rejected OAuth refresh as "identity provider unreachable" and gave no recovery guidance. **Root cause:** every failure in the OAuth path funneled into one `AuthError::Provider` whose Display was hardcoded `"identity provider unreachable: {0}"`. A reachable IdP returning `400 invalid_grant` on a refresh got labeled "unreachable", sending debugging toward the network. The real refresh cause was also swallowed — `bearer()` logged it and returned `None`, so sync health only ever showed the *downstream* 401 on `/sync/pull`. ### Wording fix (`auth.rs` / `oauth.rs`) - Split `AuthError` into **`Unreachable`** (transport), **`Rejected`** (IdP returned an HTTP error — carries the RFC 6749 §5.2 `error`/`error_description`), and **`Other`** (keyring / malformed response, previously mislabeled too). - `refresh()`/`discover()`/`start()`/`poll()` classify transport vs status; `refresh` reads the OAuth error body on a non-2xx. - Hub-side token verify maps IdP-infra failures → 503, token failures → 401. ### Recovery UX (`server.rs` / `heph` / `heph-tui`) - `bearer()` now returns `Result`; the sync paths record the real acquisition failure (with a re-login hint when it's a rejection) instead of a masked 401, so `last_error` reflects the *refresh* cause (e.g. `invalid_grant`). - Sync health's `last_error` carries the exact `heph auth login --hub-url … --issuer … --client-id …` command, built daemon-side and keyed to the configured hub URL (per the per-URL token-keying gotcha). `sync.status` also returns `issuer`/`client_id` + the command. - New **`heph auth status`** prints auth health and the re-login command. - `heph-tui`'s auth chip points at it: `⚠ auth · heph auth status`. Also corrects a now-stale gap note in `set-up-sync-hub.md` (daemon config baking landed in #12). ## Testing - [x] `cargo test --workspace` green; `cargo clippy --workspace --all-targets` clean; fmt + prek hooks pass - [x] New `tests/oauth.rs`: a `400 invalid_grant` refresh → `Rejected` with the OAuth body (not "unreachable"); a dead IdP → `Unreachable` - [x] New `auth.rs` unit tests for `AuthError::rejected` formatting / `is_rejection` - [x] `heph-tui` indicator test updated for the new chip - [x] Smoke-tested `heph auth status` against the live daemon (renders hub + auth state; degrades gracefully on the older daemon that doesn't yet return the new fields) ## Closes - `01KTFSHMYK…` + `01KTFVNEQH…` (duplicate "misleading identity provider unreachable") - `01KTFSHMYZ…` (actionable re-auth guidance) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(hephd,heph,heph-tui): distinguish IdP rejection from unreachable + actionable re-auth
All checks were successful
Build / validate (pull_request) Successful in 6m12s
e943a940f1
The spoke OAuth path funneled every failure into one `AuthError::Provider`
whose Display was hardcoded "identity provider unreachable". So a reachable IdP
returning `400 invalid_grant` on a refresh was reported as "unreachable",
misdirecting incident response toward the network when the fix is re-auth. The
real refresh cause was also swallowed — `bearer()` logged it and returned None,
so sync health only ever showed the downstream 401 on /sync/pull.

Wording fix (auth.rs / oauth.rs):
- Split AuthError into Unreachable (transport), Rejected (IdP returned an HTTP
  error — carries the RFC 6749 §5.2 error/error_description), and Other
  (keyring / malformed response, previously mislabeled too).
- refresh()/discover()/start()/poll() classify transport vs status; refresh
  reads the OAuth error body on a non-2xx.
- Hub-side token verify maps IdP-infra failures → 503, token failures → 401.

Recovery UX (server.rs / heph / heph-tui):
- bearer() returns Result; the sync paths record the real acquisition failure
  (with a re-login hint when it's a rejection) instead of a masked 401.
- sync health's last_error carries the exact `heph auth login --hub-url …
  --issuer … --client-id …` command (keyed to the configured hub); sync.status
  also returns issuer/client_id + the command.
- New `heph auth status` prints auth health and the re-login command.
- heph-tui's auth chip points at it: `⚠ auth · heph auth status`.

Closes the duplicate "misleading identity provider unreachable" tasks and the
"actionable re-auth guidance" task. Also corrects a now-stale set-up-sync-hub
gap note (daemon config baking landed in the prior PR).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eblume merged commit 9a4f18fbd5 into main 2026-06-08 14:10:42 -07:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/hephaestus!14
No description provided.