generated from eblume/project-template
Fix macOS heph daemon restart bootout→bootstrap race (5: Input/output error) #13
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/daemon-restart-race"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Follow-up to #12.
heph daemon restarton macOS intermittently failed withlaunchctl bootstrap failed: 5: Input/output error. The cause:restartbootstrapped immediately afterbootout, butlaunchctl bootoutis asynchronous — launchd may still be killing/reaping the job and removing its label from thegui/$uiddomain when the command returns. Bootstrapping into that transitional domain returns a generic EIO. Whether it races depends on how fasthephd(sync client + SQLite store/lock + a supervisedheph-quickaddchild) shuts down, so it surfaced intermittently.Fix (launchd path only — systemd's
restartis already a synchronous transaction):wait_until_unloaded: polllaunchctl print, bounded to 5s) before re-bootstrapping.launchd_bootstrap: up to 5 attempts, 200ms apart) to cover the residual settle window.startshares this helper too.launchctl kickstart -kto restart the loaded job atomically, with no bootout/bootstrap and no race. Full reload (bootout+bootstrap) is reserved for genuine config changes, where launchd must re-read the plist.Testing
cargo test -p heph(existing 12 service tests green; clippy/fmt clean)heph daemon restart— kickstart path (plist unchanged) and reload path (config change). The launchctl-calling helpers aren't unit-testable.Notes
kickstart -krestarts the loaded job definition, so it intentionally does not pick up an edited plist — which is exactly why it's gated on "plist unchanged."🤖 Generated with Claude Code
heph daemon restartrace-freeVerified live on gilbert (macOS launchd), built from this branch (hephd stays the installed 1.2.3):
heph daemon restart, plist unchanged): 3 consecutive restarts, all rc=0, fresh pid each time, no5: Input/output error.restart --self-update-interval-secs 300then--self-update-interval-secs 600— both rc=0, plist re-read (interval 600→300→600), daemon running each time.heph next), daemon args retain all spoke flags (--hub-url/--oidc-*/--self-update --self-update-interval-secs 600), log shows cleanself-update enabled interval_secs=600 … listening.Ran the old code ~once and reproduced the EIO; the new code restarted cleanly 5/5 times across both paths.