hephaestus/docs/how-to/fuzz-testing.md
Erich Blume e7ced4f8f9
All checks were successful
Build / validate (pull_request) Successful in 10m29s
test(fuzz): add cargo-fuzz targets for CRDT and extraction surfaces
Tier 2 fuzzing: a nightly cargo-fuzz crate at crates/heph-core/fuzz/ with
three targets (crdt_merge, crdt_write, extract), reaching crate-private CRDT
internals through heph-core's new 'fuzzing' feature. Driven ad-hoc via
'mise run fuzz'; not in CI (needs nightly + wall clock).

crdt_merge immediately surfaced robustness gaps in yrs 0.27 on malformed sync
deltas (a 4-byte input OOMs; other inputs abort/UB) — uncatchable, limited
blast radius (authenticated /sync/push), documented as a known limitation.
extract and crdt_write ran clean over ~1M cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 13:03:10 -07:00

5.6 KiB

title modified tags
Fuzz Testing 2026-06-09
how-to

Fuzz Testing

heph's parsing layer is pure and clock-injected, which makes it a natural fit for randomized testing. Fuzzing runs at two tiers:

Tier 1 — property tests (proptest, stable Rust, runs in CI)

Property-based tests live alongside the unit tests in each module and run as part of the normal cargo test suite — no extra tooling, and CI picks them up via the standard build hook.

Covered invariants:

Module Invariants
heph-core/src/extract.rs extraction never panics, is idempotent; links are non-empty/trimmed/deduped; context_item_lines aligns 1:1 with context_items
heph-core/src/wikilink.rs expand/collapse are idempotent; collapse(expand(x)) == collapse(x)
heph-core/src/crdt.rs a write materializes exactly; concurrent edits converge regardless of merge order; merge is idempotent; merging arbitrary garbage bytes never panics
heph-core/src/frontmatter.rs strip is idempotent and always returns a suffix of its input
heph-core/src/recurrence.rs checkbox reset properties (pre-existing); next_occurrence is strictly after after; arbitrary RRULE strings never panic
heph-core/src/hlc.rs HLC ordering properties (pre-existing); parse never panics
hephd/src/datespec.rs parse_date never panics (including huge +N offsets); offsets and ISO dates round-trip
hephd/src/quickadd.rs parse never panics; title words always come from the input

Run them with the rest of the suite:

cargo test

Tier 2 — coverage-guided fuzzing (cargo-fuzz, nightly, run ad-hoc)

libFuzzer targets live in crates/heph-core/fuzz/. These are for the surfaces where coverage guidance beats random generation — chiefly the CRDT layer, which decodes attacker-controllable bytes arriving via sync (yrs update payloads in op-log entries).

Targets:

  • crdt_merge — feeds arbitrary (state, delta) byte pairs to merge_body; asserts no panic and merge idempotence. This is the remote-input surface.
  • crdt_write — arbitrary (prev, new) string pairs through write_body; asserts the diff/CRDT round-trip materializes new exactly (UTF-8 boundary stress).
  • extract — arbitrary markdown through extract + context_item_lines; asserts the 1:1 alignment invariant promotion depends on.

Requirements: rustup toolchain install nightly and cargo install cargo-fuzz. The fuzz targets reach crate-private CRDT internals through the fuzzing cargo feature of heph-core, which exposes thin public wrappers — the feature is never enabled in normal builds.

Run all targets briefly (default 60s each), or one target for longer:

mise run fuzz             # all targets, 60s each
mise run fuzz 300         # all targets, 5 min each
cargo +nightly fuzz run crdt_merge --fuzz-dir crates/heph-core/fuzz -- -max_total_time=3600

Crash artifacts land in crates/heph-core/fuzz/artifacts/<target>/; the corpus accumulates in crates/heph-core/fuzz/corpus/<target>/ (both gitignored). Reproduce a crash with cargo +nightly fuzz run <target> --fuzz-dir crates/heph-core/fuzz <artifact-path>.

Tier 2 is deliberately not wired into CI: it needs nightly and meaningful wall clock to earn its keep. Run it ad-hoc after touching crdt.rs, extract.rs, or the sync payload path. If it ever moves to CI, a scheduled (not per-push) workflow with a persistent corpus is the right shape.

Findings so far

The first runs paid for themselves. Tier 1 proptests found two reachable panics on user input, both fixed in the same change:

  • datespec::parse_offset panicked on a large relative offset (e.g. +999999999999d) because chrono's + overflows; now uses checked arithmetic and returns an out-of-range error.
  • datespec::parse_month_day sliced a token on a non-char boundary for multibyte input (e.g. an every <Month> <day> phrase containing 𐻂); now takes the first three chars.

Tier 2 (crdt_merge) surfaced robustness gaps in yrs 0.27 on malformed update bytes, reachable through the authenticated /sync/push path:

  • a tiny delta [255, 255, 255, 126] triggers a huge allocation → OOM;
  • some inputs trip a debug_assert! in the yrs block decoder (unwinding panic — contained by the catch_unwind in merge_body);
  • at least one class hits genuine UB (an invalid char) → SIGABRT under debug UB-checks, silent UB in release.

These are not fully fixable in-tree: yrs exposes no pre-apply validator, and the OOM/abort classes are uncatchable. The blast radius is limited (the sync endpoint is authenticated), but a buggy or hostile authenticated peer can still crash a daemon. The catch_unwind in merge_body is partial mitigation; durable fixes need upstream yrs work or a bounded decoder. Until then this is a known limitation, tracked here and reproduced by the crdt_merge target.

Why these targets

The high-value surfaces, ranked when this was set up:

  1. crdt::merge_body — decodes untrusted bytes from sync peers; a panic here is a remote-input daemon crash.
  2. extract — custom scanning logic layered over pulldown-cmark; promotion rewrites body lines based on its output, so misalignment corrupts bodies.
  3. wikilink rewriting — span arithmetic where off-by-ones hide.
  4. datespec/quickadd — user-typed input parsed inside the daemon process.

Crashes found in dependencies (yrs, rrule, pulldown-cmark) are still real hephd crashes — handle by validating/catching before the call, and report upstream.