C2(hephd-self-update): plan add goal + prerequisite cards for hephd self-update

Kick off the C2 Mikado chain for an opt-in (default-off) hephd self-update mode (forge-poll -> cargo install from tag -> self-restart). Goal card plus eight prerequisite cards, indexed from how-to.md: release-poll-version-check, self-update-opt-in-flag (leaves) -> self-update-poll-loop (notify-only core) service-env-forge-access (leaf, the cargo/forge blocker) + self-update-poll-loop -> cargo-install-from-tag service-respawn-on-clean-exit (leaf, systemd Restart=always) + cargo-install-from-tag -> self-restart-after-update verify-hub-dropout-resilience (leaf, lock in the base-case guarantee) Grounded in research of hephd's sync loop, daemon lifecycle, the launchd/systemd service templates, and the forge releases API. Captured from Hephaestus task 01KTA2NSNRYT902HC3VRW00S1J. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 13:20:46 -07:00 · 2026-06-04 13:20:46 -07:00 · e6524fddbb
commit e6524fddbb
parent 254c83036b
10 changed files with 344 additions and 0 deletions
--- a/docs/how-to/how-to.md
+++ b/docs/how-to/how-to.md
@ -20,3 +20,17 @@ Task-oriented guides for common operations.
 - [[run-the-daemon]] — Run `hephd` as an OS service with `heph daemon start/stop/restart/status`
 - [[set-up-sync-hub]] — Stand up the canonical hub (indri) and connect an existing device as an offline-capable spoke
 - [[import-todoist]] — Seed a heph store from your Todoist projects + tasks (`mise run import-todoist`)
+
+## Active Mikado chains
+
+C2 chain: **hephd self-update** (opt-in daemon auto-update). See [[agent-change-process]] for the method.
+
+- [[hephd-self-update]] — goal: opt-in, default-off mode where `hephd` polls for new releases and auto-updates itself
+- [[self-update-opt-in-flag]] — the `--self-update` opt-in flag (default off)
+- [[release-poll-version-check]] — poll the forge releases API and semver-compare against the running version
+- [[self-update-poll-loop]] — background task wiring the flag to the version check (notify-only core)
+- [[service-env-forge-access]] — give the daemon's service environment cargo + forge SSH access (the cargo/forge blocker)
+- [[cargo-install-from-tag]] — rebuild + install the new binaries via `cargo install` from the release tag
+- [[service-respawn-on-clean-exit]] — make the service manager respawn hephd after a clean exit (systemd `Restart=always`)
+- [[self-restart-after-update]] — exit cleanly after a successful install so the new binary takes over
+- [[verify-hub-dropout-resilience]] — lock in "the hub can vanish at any moment" as the base case
--- a/docs/how-to/self-update/cargo-install-from-tag.md
+++ b/docs/how-to/self-update/cargo-install-from-tag.md
@ -0,0 +1,38 @@
+---
+title: Cargo install from tag
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires:
+  - self-update-poll-loop
+  - service-env-forge-access
+---
+
+# Cargo install from tag
+
+The apply step: when the poll loop detects a newer release, rebuild + install
+the new binaries from the release tag.
+
+## Deliverables
+
+- From the detected tag `vX.Y.Z`, run (via `tokio::task::spawn_blocking`, since
+  it's a long blocking child process):
+  ```
+  cargo install --locked \
+    --git ssh://forgejo@forge.ops.eblu.me:2222/eblume/hephaestus.git \
+    --tag vX.Y.Z heph hephd heph-tui heph-quickadd
+  ```
+  This is the exact command the install how-to and the manual redeploy use; it
+  swaps `~/.cargo/bin/*` in place.
+- Capture stdout/stderr and exit status; log success/failure. A failed build
+  must **not** restart the daemon — only a successful install proceeds to
+  [[self-restart-after-update]].
+- Guard against re-running while an install is in flight (the long compile spans
+  multiple poll ticks): a simple "update in progress" flag.
+
+## Done when
+
+On a real newer tag, the daemon completes the install and the new binary is on
+disk at `~/.cargo/bin`. Requires [[self-update-poll-loop]] and
+[[service-env-forge-access]]. Part of [[hephd-self-update]].
--- a/docs/how-to/self-update/hephd-self-update.md
+++ b/docs/how-to/self-update/hephd-self-update.md
@ -0,0 +1,51 @@
+---
+title: hephd self-update
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+branch: mikado/hephd-self-update
+requires:
+  - self-restart-after-update
+  - verify-hub-dropout-resilience
+---
+
+# hephd self-update
+
+**Goal (desired end state).** An opt-in, **default-off** mode where `hephd`
+periodically polls the forge for a newer release and, when one exists,
+rebuilds via `cargo install` from the release tag and restarts itself onto the
+new binary — unattended.
+
+## End state
+
+- A new daemon flag (`--self-update`, default off) plus a poll interval. When
+  off, behaviour is unchanged. See [[self-update-opt-in-flag]].
+- A background task (modelled on the existing spoke sync loop,
+  `crates/hephd/src/server.rs` `spawn_sync_loop`) that on each tick fetches the
+  latest release and compares it to `heph_core::VERSION`. See
+  [[self-update-poll-loop]] and [[release-poll-version-check]].
+- On a newer release: run `cargo install --locked --git ssh://… --tag vX.Y.Z`
+  for all workspace binaries ([[cargo-install-from-tag]]), then exit cleanly so
+  the OS service manager respawns the new binary
+  ([[self-restart-after-update]], [[service-respawn-on-clean-exit]]).
+- Running `cargo install` from inside the service requires the daemon's
+  environment to have cargo + forge SSH access — the known blocker tracked in
+  [[service-env-forge-access]].
+
+## Design decisions (owner)
+
+- **Default off**, opt-in only. Never self-update silently by default.
+- Delivery is **`cargo install` from the tag** for now (prebuilt release
+  binaries are a possible future, pending a cargo/forge canonical-host fix).
+- **Hub can disappear at any moment** — that resilience is the *base case*, not
+  a special guard. The sync loop already tolerates an unreachable hub; we lock
+  that in rather than add update-specific guards. See
+  [[verify-hub-dropout-resilience]].
+
+## Scope notes
+
+Captured from task `01KTA2NSNRYT902HC3VRW00S1J` in the `Hephaestus` project.
+Possible later refinements (own cards if pursued): checksum/signature
+verification of the built binary, prebuilt release-binary delivery, and a
+notify-only sub-mode.
--- a/docs/how-to/self-update/release-poll-version-check.md
+++ b/docs/how-to/self-update/release-poll-version-check.md
@ -0,0 +1,32 @@
+---
+title: Release poll + version check
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires: []
+---
+
+# Release poll + version check
+
+The piece that answers "is a newer release available?" — independent of any
+daemon wiring, so it can be unit-tested in isolation.
+
+## Deliverables
+
+- Fetch the latest release from the forge:
+  `GET https://forge.ops.eblu.me/api/v1/repos/eblume/hephaestus/releases/latest`,
+  read `tag_name` (e.g. `v1.0.4`). hephd already depends on `ureq` and
+  `reqwest` (`crates/hephd/Cargo.toml`) — reuse one (the poll loop is async, so
+  `reqwest` fits; `ureq` would need `spawn_blocking`).
+- Parse the running version: `heph_core::VERSION` is `"1.0.3 (sha)"` — take the
+  `X.Y.Z` head. Add `semver = "1"` to `crates/hephd/Cargo.toml` (already in the
+  lockfile transitively) and compare `tag_name` (strip leading `v`) against it.
+- A pure `is_newer(current, tag) -> bool` helper with tests covering equal /
+  older / newer / malformed tags.
+
+## Done when
+
+Given a fixed current version and a sample releases-API JSON body, the helper
+correctly reports whether an update exists. No daemon loop yet — that's
+[[self-update-poll-loop]]. Part of [[hephd-self-update]].
--- a/docs/how-to/self-update/self-restart-after-update.md
+++ b/docs/how-to/self-update/self-restart-after-update.md
@ -0,0 +1,34 @@
+---
+title: Self-restart after update
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires:
+  - cargo-install-from-tag
+  - service-respawn-on-clean-exit
+---
+
+# Self-restart after update
+
+The last step: once the new binary is installed, get the running daemon to hand
+off to it.
+
+## Deliverables
+
+- After a successful [[cargo-install-from-tag]], have hephd exit cleanly
+  (`std::process::exit(0)`) so the service manager respawns the new binary.
+  hephd has no graceful-shutdown path today (`serve` is an infinite accept
+  loop) — a clean process exit is acceptable; in-flight RPC connections simply
+  drop and clients reconnect (the plugin already reconnects-once).
+- Relies on [[service-respawn-on-clean-exit]] so the exit is actually followed
+  by a respawn on both platforms.
+- Log a clear "restarting into vX.Y.Z" line before exit. Optionally re-check
+  that the on-disk version actually changed before restarting, to avoid a
+  restart loop if the install was a no-op.
+
+## Done when
+
+End-to-end: an enabled daemon on an older version detects a newer release,
+installs it, restarts, and comes back reporting the new `version` RPC value.
+This closes the apply path of [[hephd-self-update]].
--- a/docs/how-to/self-update/self-update-opt-in-flag.md
+++ b/docs/how-to/self-update/self-update-opt-in-flag.md
@ -0,0 +1,32 @@
+---
+title: Self-update opt-in flag
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires: []
+---
+
+# Self-update opt-in flag
+
+The opt-in surface. hephd config today is pure clap flags (no config file) in
+`crates/hephd/src/main.rs`.
+
+## Deliverables
+
+- Add `--self-update` (bool, **default false**) and an interval override (e.g.
+  `--self-update-interval-secs`, with a sane default like 6h). Document them in
+  the flag help.
+- Thread them into the daemon the same way `--hub-url` / spoke auth are
+  (`Daemon::new(...).with_hub(...)` → add `.with_self_update(cfg)`).
+- When the flag is absent, the daemon behaves exactly as today (the loop in
+  [[self-update-poll-loop]] is simply not spawned).
+- Later, bake the flag into the generated service definition (launchd/systemd)
+  so an enabled daemon keeps self-updating across restarts — coordinate with
+  [[service-respawn-on-clean-exit]] (same templates in `crates/heph/src/service.rs`).
+
+## Done when
+
+`hephd --self-update` starts the daemon with the mode enabled (verifiable via a
+startup log line); omitting it leaves current behaviour untouched. Part of
+[[hephd-self-update]].
--- a/docs/how-to/self-update/self-update-poll-loop.md
+++ b/docs/how-to/self-update/self-update-poll-loop.md
@ -0,0 +1,34 @@
+---
+title: Self-update poll loop
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires:
+  - release-poll-version-check
+  - self-update-opt-in-flag
+---
+
+# Self-update poll loop
+
+The background task that ties the flag to the version check. This card alone
+yields a working **notify-only** daemon ("update available: vX.Y.Z" in the
+log) — the apply path layers on after.
+
+## Deliverables
+
+- Spawn a `tokio` task modelled on `spawn_sync_loop`
+  (`crates/hephd/src/server.rs`): `tokio::time::interval` ticking at the
+  configured cadence, guarded so it's a no-op unless `--self-update` is set.
+- Each tick: run the [[release-poll-version-check]]. On "newer available", log
+  it (and, once the apply path exists, hand off to [[cargo-install-from-tag]]).
+- Errors (forge unreachable, bad JSON) are logged and the loop continues —
+  same resilience pattern the sync loop uses. A flaky forge must never crash or
+  block the daemon.
+
+## Done when
+
+With `--self-update` on and a stubbed/real "newer" release, the daemon logs an
+update-available line once per detection; with the flag off, no task runs.
+Requires [[release-poll-version-check]] and [[self-update-opt-in-flag]]. Part of
+[[hephd-self-update]].
--- a/docs/how-to/self-update/service-env-forge-access.md
+++ b/docs/how-to/self-update/service-env-forge-access.md
@ -0,0 +1,37 @@
+---
+title: Service env forge access
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires: []
+---
+
+# Service env forge access
+
+The known blocker. `cargo install --git ssh://forgejo@forge.ops.eblu.me:2222/…`
+works from an interactive shell (it has an SSH agent/key and cargo on PATH) —
+but the daemon runs under launchd/systemd, whose environment likely has
+**neither**. Self-update via cargo can't work until the service context can
+reach the forge and run cargo.
+
+## What to establish
+
+- **cargo + toolchain on the service PATH.** launchd/systemd start with a
+  minimal env; `~/.cargo/bin` and rustup's toolchain must be discoverable.
+  Likely bake `PATH`/`EnvironmentFile` into the generated plist/unit
+  (`crates/heph/src/service.rs`).
+- **Forge SSH auth without an interactive agent.** Options to evaluate: a
+  dedicated read-only deploy key referenced via `GIT_SSH_COMMAND`/an SSH config
+  entry, or `SSH_AUTH_SOCK` exported to the service. Must work headless.
+- **The canonical-host caveat.** Owner note: cargo rejects `forge.ops.eblu.me`
+  over HTTPS because the forge advertises `forge.eblu.me` as canonical; the
+  **SSH** URL on port 2222 sidesteps this and is the proven path (used by the
+  install how-to and the v1.0.3 redeploy). Pin self-update to the SSH URL;
+  capture any `insteadOf`/known_hosts setup needed headlessly.
+
+## Done when
+
+A hephd running as the installed service can, in its own environment, complete
+`cargo install --locked --git ssh://… --tag <known-good> hephd` non-interactively.
+Unblocks [[cargo-install-from-tag]]. Part of [[hephd-self-update]].
--- a/docs/how-to/self-update/service-respawn-on-clean-exit.md
+++ b/docs/how-to/self-update/service-respawn-on-clean-exit.md
@ -0,0 +1,35 @@
+---
+title: Service respawn on clean exit
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires: []
+---
+
+# Service respawn on clean exit
+
+For "self-restart" to mean "exit and let the manager bring up the new binary",
+both service managers must respawn hephd after a **clean** (exit code 0)
+shutdown. Templates live in `crates/heph/src/service.rs`.
+
+## Current state (from research)
+
+- **launchd (macOS):** plist has `KeepAlive = true` → already respawns on clean
+  exit. No change needed.
+- **systemd (Linux):** unit is `Restart=on-failure` → a clean exit (code 0)
+  does **not** respawn. Self-restart would silently stop the daemon.
+
+## Deliverables
+
+- Change the systemd unit template to `Restart=always` (with a small
+  `RestartSec`) so a deliberate clean exit is respawned.
+- Note in install/upgrade docs that **already-installed services must be
+  reinstalled** (`heph daemon` re-generates the unit) to pick up the new
+  policy; otherwise self-restart won't work on existing Linux installs.
+
+## Done when
+
+On both platforms, a hephd that calls `exit(0)` is brought back up by the
+service manager. Pairs with [[self-restart-after-update]]. Part of
+[[hephd-self-update]].
--- a/docs/how-to/self-update/verify-hub-dropout-resilience.md
+++ b/docs/how-to/self-update/verify-hub-dropout-resilience.md
@ -0,0 +1,37 @@
+---
+title: Verify hub-dropout resilience
+modified: 2026-06-04
+tags:
+  - how-to
+status: active
+requires: []
+---
+
+# Verify hub-dropout resilience
+
+Owner requirement: "the hub can go poof at any moment" must be the **base
+case**, not a guard bolted on for self-update. A self-updating hub will restart
+under its spokes, so spokes must already shrug off an unreachable hub.
+
+## Current state (from research)
+
+Already largely true: `sync_once` (`crates/hephd/src/sync.rs`) propagates
+errors, and the background loop (`spawn_sync_loop`, `crates/hephd/src/server.rs`)
+catches them — `tracing::warn!("background sync failed: {e}")` — and continues.
+The local SQLite store stays writable, so the spoke works offline and
+reconciles on the next successful tick. No panic, no block.
+
+## Deliverables
+
+- Lock the guarantee in with an explicit test: a spoke whose hub is unreachable
+  for one or more sync cycles keeps serving local RPCs and accepting writes,
+  then reconciles when the hub returns.
+- If any path is found that *doesn't* degrade gracefully (a blocking call, an
+  unwrapped error, a restart that loses unsynced ops), fix it here — that is the
+  whole point of this card.
+
+## Done when
+
+A test demonstrates spoke survival across hub downtime, documenting the
+base-case guarantee that makes a self-updating hub safe. Part of
+[[hephd-self-update]].