C2: hephd self-update (Mikado plan — cards for review) #7

Merged
eblume merged 19 commits from mikado/hephd-self-update into main 2026-06-04 15:03:23 -07:00
Owner

C2 Mikado chain: opt-in hephd self-update

This PR opens the planning layer of a C2 Mikado chain (cards only, no code yet) for review of the dependency graph before implementation cycles begin. From Hephaestus task 01KTA2NSNRYT902HC3VRW00S1J.

Goal: an opt-in, default-off mode where hephd polls the forge for a newer release and, when one exists, rebuilds via cargo install from the release tag and restarts itself onto the new binary.

Owner decisions baked in

  • Default off; opt-in only.
  • Delivery = cargo install from the tag for now (prebuilt binaries a possible future).
  • "The hub can vanish at any moment" is the base case, not a guard — verified/locked in rather than special-cased.

Dependency graph

GOAL  hephd-self-update
        ├── self-restart-after-update
        │     ├── cargo-install-from-tag
        │     │     ├── self-update-poll-loop
        │     │     │     ├── release-poll-version-check   (leaf)
        │     │     │     └── self-update-opt-in-flag       (leaf)
        │     │     └── service-env-forge-access            (leaf — the cargo/forge blocker)
        │     └── service-respawn-on-clean-exit             (leaf — systemd Restart=always)
        └── verify-hub-dropout-resilience                   (leaf)

The three leaves under self-update-poll-loop yield a working notify-only daemon first; the apply path (service-env-forge-accesscargo-install-from-tagservice-respawn-on-clean-exitself-restart-after-update) layers on auto-apply.

Key findings driving the cards

  • launchd already respawns on clean exit (KeepAlive=true); systemd is Restart=on-failure and won't — needs Restart=always (service-respawn-on-clean-exit).
  • The cargo/forge blocker is real for the service environment: the SSH URL works interactively, but launchd/systemd likely lack cargo on PATH + forge SSH (service-env-forge-access).
  • Hub-dropout resilience already largely holds (sync loop logs + continues) — verify-hub-dropout-resilience locks it in with a test.

Review asks

  • Is the decomposition right? (C2 supports a branch reset to revise the plan.)
  • Any leaf you'd rather drop/merge, or a notify-only-first staging preference?

Next per the C2 process: pick a ready leaf and start a work cycle (implclose). mise run docs-mikado hephd-self-update renders the live graph.

🤖 Generated with Claude Code

## C2 Mikado chain: opt-in hephd self-update This PR opens the **planning layer** of a C2 Mikado chain (cards only, no code yet) for review of the dependency graph before implementation cycles begin. From `Hephaestus` task `01KTA2NSNRYT902HC3VRW00S1J`. **Goal:** an opt-in, **default-off** mode where `hephd` polls the forge for a newer release and, when one exists, rebuilds via `cargo install` from the release tag and restarts itself onto the new binary. ### Owner decisions baked in - Default off; opt-in only. - Delivery = `cargo install` from the tag for now (prebuilt binaries a possible future). - **"The hub can vanish at any moment" is the base case**, not a guard — verified/locked in rather than special-cased. ### Dependency graph ``` GOAL hephd-self-update ├── self-restart-after-update │ ├── cargo-install-from-tag │ │ ├── self-update-poll-loop │ │ │ ├── release-poll-version-check (leaf) │ │ │ └── self-update-opt-in-flag (leaf) │ │ └── service-env-forge-access (leaf — the cargo/forge blocker) │ └── service-respawn-on-clean-exit (leaf — systemd Restart=always) └── verify-hub-dropout-resilience (leaf) ``` The three leaves under `self-update-poll-loop` yield a working **notify-only** daemon first; the apply path (`service-env-forge-access` → `cargo-install-from-tag` → `service-respawn-on-clean-exit` → `self-restart-after-update`) layers on auto-apply. ### Key findings driving the cards - **launchd** already respawns on clean exit (`KeepAlive=true`); **systemd** is `Restart=on-failure` and won't — needs `Restart=always` (`service-respawn-on-clean-exit`). - The `cargo`/forge blocker is real for the *service environment*: the SSH URL works interactively, but launchd/systemd likely lack cargo on PATH + forge SSH (`service-env-forge-access`). - Hub-dropout resilience already largely holds (sync loop logs + continues) — `verify-hub-dropout-resilience` locks it in with a test. ### Review asks - Is the decomposition right? (C2 supports a branch reset to revise the plan.) - Any leaf you'd rather drop/merge, or a notify-only-first staging preference? Next per the C2 process: pick a ready leaf and start a work cycle (`impl` → `close`). `mise run docs-mikado hephd-self-update` renders the live graph. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
C2(hephd-self-update): plan add goal + prerequisite cards for hephd self-update
All checks were successful
Build / validate (pull_request) Successful in 5m23s
e6524fddbb
Kick off the C2 Mikado chain for an opt-in (default-off) hephd
self-update mode (forge-poll -> cargo install from tag -> self-restart).
Goal card plus eight prerequisite cards, indexed from how-to.md:

  release-poll-version-check, self-update-opt-in-flag (leaves)
    -> self-update-poll-loop                 (notify-only core)
  service-env-forge-access (leaf, the cargo/forge blocker)
    + self-update-poll-loop -> cargo-install-from-tag
  service-respawn-on-clean-exit (leaf, systemd Restart=always)
    + cargo-install-from-tag -> self-restart-after-update
  verify-hub-dropout-resilience (leaf, lock in the base-case guarantee)

Grounded in research of hephd's sync loop, daemon lifecycle, the
launchd/systemd service templates, and the forge releases API.
Captured from Hephaestus task 01KTA2NSNRYT902HC3VRW00S1J.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add crates/hephd/src/selfupdate.rs: a pure update_available() that
compares the running heph_core::VERSION (e.g. "1.0.3 (sha)") against a
release tag ("v1.0.4") via semver, ignoring the build suffix and v
prefix; plus parse_latest_tag() / fetch_latest_tag() for the forge
releases/latest feed. Decision logic and JSON parsing are unit-tested
against sample payloads; the network fetch is isolated. Adds the semver
workspace dep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version-compare + forge release parsing landed and unit-tested.
Add --self-update (default off) and --self-update-interval-secs to the
hephd CLI, a SelfUpdateConfig (Some => enabled), and thread it into the
Daemon (with_self_update) for every mode. spawn_self_update_loop()
currently just announces the mode at startup ('self-update enabled')
so the opt-in is observable; the poll/apply cycle is wired in later
leaves. Omitting the flag leaves behaviour unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--self-update flag + config plumbing landed; opt-in observable via the
startup log line.
Add a ReleaseSource trait (real ForgeReleaseSource over HTTP; injectable
for tests), check_release() returning a CheckOutcome
(UpToDate/UpdateAvailable/Failed) that never errors so a flaky forge
can't stall the daemon, and run_poll_loop() that ticks on the configured
interval and logs when a newer release is available. spawn_self_update_loop
now spawns the real poller. Detection is unit-tested with a stubbed source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
C2(hephd-self-update): close self-update-poll-loop
All checks were successful
Build / validate (pull_request) Successful in 3m56s
758854478b
Notify-only poller landed: ticks on the interval, logs when a newer
release is available. The daemon now self-reports update availability.
Self-restart works by exiting cleanly and letting the service manager
respawn the new binary. launchd already does this (KeepAlive=true), but
the systemd user unit was Restart=on-failure, which ignores a clean
exit (code 0). Switch to Restart=always + RestartSec=1, update the unit
test, and note in run-the-daemon that existing Linux installs must
`heph daemon restart` once to regenerate the unit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
systemd unit now Restart=always; both managers respawn after a clean exit.
Lock in the base-case guarantee that a self-updating hub (which restarts
under its spokes) relies on. New sync_http test: a spoke whose hub is
unreachable keeps serving + accepting writes, a sync attempt fails fast
(Err, not hang/panic), and when the hub returns the accumulated ops
reconcile with no special recovery.

The verification surfaced one non-graceful path — the daemon's shared
reqwest client had no timeout, so a black-hole hub (connects, never
replies) could stall the sync/self-update loop. Give it a 30s timeout so
'the hub can vanish at any moment' holds even mid-request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
C2(hephd-self-update): close verify-hub-dropout-resilience
All checks were successful
Build / validate (pull_request) Successful in 5m24s
fd76aa0b3a
Spoke survival across hub downtime is now covered by a test; added a
client timeout so a black-hole hub can't stall the loop.
Add an Installer trait + CargoInstaller (runs cargo install --locked
--git <ssh> --tag <tag> for heph/hephd/heph-tui/heph-quickadd — the
documented install command, via the SSH host that sidesteps the
cargo/forge canonical-name mismatch), and apply_update() which runs the
blocking install on the blocking pool. The poll loop now applies on a
detected update. Apply path is unit-tested with a fake installer (call +
failure paths); the real cargo subprocess is never run in tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Installer trait + CargoInstaller + apply_update landed and unit-tested
via injection. Real cargo execution is gated on the deployment env
(service-env-forge-access).
Add a Restarter trait + ProcessRestarter (exit 0 so launchd KeepAlive /
systemd Restart=always respawn the new binary). apply_update now installs
then restarts, and the restart fires only on a successful install. Wired
into the poll loop. Unit-tested with fake installer+restarter: restart on
success, no restart after a failed install. Real process exit is never
run in tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restarter + ProcessRestarter wired: install then exit(0) so the service
manager respawns the new binary; restart only on a successful install.
Unit-tested via injection.
C2(hephd-self-update): impl correct spawn_self_update_loop doc
All checks were successful
Build / validate (pull_request) Successful in 6m1s
20418240f7
The poller now installs + restarts (not just logs); fix the stale doc and
point at service-env-forge-access as the deployment step that makes the
apply path operational.
The repo is public, so self-update needs no credentials: cargo install
--git is a plain anonymous clone (NOT the access-restricted Forgejo cargo
registry, which is what required forge.ops.eblu.me). Point INSTALL_GIT_URL
and the releases poll at the canonical public host over HTTPS — verified
end-to-end (cargo install --git https://forge.eblu.me/... --tag v1.0.3
builds a working hephd with zero auth).

Make the headless service able to run the apply path: 'heph daemon
start --self-update' (default off) generates a launchd/systemd service
that passes --self-update and bakes a PATH (incl ~/.cargo/bin) + HOME so
the minimal service env can find cargo. restart preserves the setting.
Default (no flag) services are byte-identical to before. Template + URL
behavior covered by unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Public repo => anonymous HTTPS clone, no credentials (the SSH/canonical
premise was wrong: that was the access-restricted cargo registry, not git
clone). Install URL points at the canonical public host (verified end to
end); the service template bakes cargo onto PATH. Card rewritten to
reflect what actually happened.
C2(hephd-self-update): finalize — changelog + mark goal implemented
All checks were successful
Build / validate (pull_request) Successful in 5m37s
c321d72e7d
All eight prerequisite leaves are closed; the daemon-side feature is
implemented and the cargo-install-over-public-HTTPS mechanism is verified
end-to-end. Add the changelog fragment and drop the goal card's
status/branch, noting the one remaining owner check: observing a real
older->newer upgrade on the next release.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Owner

Update: chain finalized — all 8 leaves closed

The public-repo correction removed the only real blocker (service-env-forge-access). cargo install --git is a plain anonymous clone of the public repo — not the access-restricted Forgejo cargo registry (that was what required forge.ops.eblu.me). So: no SSH, no credentials.

Done since last update:

  • INSTALL_GIT_URL + release poll → canonical public HTTPS host. Verified end-to-end: cargo install --git https://forge.eblu.me/eblume/hephaestus.git --tag v1.0.3 hephd builds a working binary with zero auth.
  • heph daemon start --self-update (default off) generates a launchd/systemd service that passes --self-update and bakes a cargo-capable PATH/HOME; restart preserves it. Default services unchanged.
  • service-env-forge-access closed (card corrected — the SSH premise was wrong); goal finalized; changelog added.

One owner check remains: a live older→newer upgrade can only be observed at the next release. Enable --self-update then and confirm. All detection/apply logic is unit-tested and the install mechanism is verified, so this is a smoke test, not a risk.

Ready for review/merge.

## Update: chain finalized — all 8 leaves closed The public-repo correction removed the only real blocker (`service-env-forge-access`). `cargo install --git` is a plain anonymous clone of the public repo — not the access-restricted Forgejo cargo *registry* (that was what required `forge.ops.eblu.me`). So: no SSH, no credentials. **Done since last update:** - `INSTALL_GIT_URL` + release poll → canonical public HTTPS host. **Verified end-to-end**: `cargo install --git https://forge.eblu.me/eblume/hephaestus.git --tag v1.0.3 hephd` builds a working binary with zero auth. - `heph daemon start --self-update` (default off) generates a launchd/systemd service that passes `--self-update` and bakes a cargo-capable PATH/HOME; `restart` preserves it. Default services unchanged. - `service-env-forge-access` closed (card corrected — the SSH premise was wrong); goal finalized; changelog added. **One owner check remains:** a live older→newer upgrade can only be observed at the next release. Enable `--self-update` then and confirm. All detection/apply logic is unit-tested and the install mechanism is verified, so this is a smoke test, not a risk. Ready for review/merge.
eblume force-pushed mikado/hephd-self-update from c321d72e7d
All checks were successful
Build / validate (pull_request) Successful in 5m37s
to 443763489b
All checks were successful
Build / validate (pull_request) Successful in 6m10s
2026-06-04 15:01:14 -07:00
Compare
eblume merged commit 529f8b67d1 into main 2026-06-04 15:03:23 -07:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/hephaestus!7
No description provided.