From 4960e72e762b8857824ecfa174c7dccb3c25a7b3 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Tue, 9 Jun 2026 12:45:41 -0700 Subject: [PATCH] docs: add fuzz-testing how-to (two-tier proptest + cargo-fuzz plan) Co-Authored-By: Claude Fable 5 --- docs/how-to/fuzz-testing.md | 97 +++++++++++++++++++++++++++++++++++++ docs/how-to/how-to.md | 1 + 2 files changed, 98 insertions(+) create mode 100644 docs/how-to/fuzz-testing.md diff --git a/docs/how-to/fuzz-testing.md b/docs/how-to/fuzz-testing.md new file mode 100644 index 0000000..2f4eacc --- /dev/null +++ b/docs/how-to/fuzz-testing.md @@ -0,0 +1,97 @@ +--- +title: Fuzz Testing +modified: 2026-06-09 +tags: + - how-to +--- + +# Fuzz Testing + +heph's parsing layer is pure and clock-injected, which makes it a natural fit +for randomized testing. Fuzzing runs at two tiers: + +## Tier 1 — property tests (proptest, stable Rust, runs in CI) + +Property-based tests live alongside the unit tests in each module and run as +part of the normal `cargo test` suite — no extra tooling, and CI picks them up +via the standard build hook. + +Covered invariants: + +| Module | Invariants | +|--------|-----------| +| `heph-core/src/extract.rs` | extraction never panics, is idempotent; links are non-empty/trimmed/deduped; `context_item_lines` aligns 1:1 with `context_items` | +| `heph-core/src/wikilink.rs` | `expand`/`collapse` are idempotent; `collapse(expand(x)) == collapse(x)` | +| `heph-core/src/crdt.rs` | a write materializes exactly; concurrent edits converge regardless of merge order; merge is idempotent; merging arbitrary garbage bytes never panics | +| `heph-core/src/frontmatter.rs` | `strip` is idempotent and always returns a suffix of its input | +| `heph-core/src/recurrence.rs` | checkbox reset properties (pre-existing); `next_occurrence` is strictly after `after`; arbitrary RRULE strings never panic | +| `heph-core/src/hlc.rs` | HLC ordering properties (pre-existing); `parse` never panics | +| `hephd/src/datespec.rs` | `parse_date` never panics (including huge `+N` offsets); offsets and ISO dates round-trip | +| `hephd/src/quickadd.rs` | `parse` never panics; title words always come from the input | + +Run them with the rest of the suite: + +```bash +cargo test +``` + +## Tier 2 — coverage-guided fuzzing (cargo-fuzz, nightly, run ad-hoc) + +libFuzzer targets live in `crates/heph-core/fuzz/`. These are for the surfaces +where coverage guidance beats random generation — chiefly the CRDT layer, which +decodes attacker-controllable bytes arriving via sync (`yrs` update payloads in +op-log entries). + +Targets: + +- `crdt_merge` — feeds arbitrary `(state, delta)` byte pairs to `merge_body`; + asserts no panic and merge idempotence. This is the remote-input surface. +- `crdt_write` — arbitrary `(prev, new)` string pairs through `write_body`; + asserts the diff/CRDT round-trip materializes `new` exactly (UTF-8 boundary + stress). +- `extract` — arbitrary markdown through `extract` + `context_item_lines`; + asserts the 1:1 alignment invariant promotion depends on. + +Requirements: `rustup toolchain install nightly` and `cargo install cargo-fuzz`. +The fuzz targets reach crate-private CRDT internals through the `fuzzing` +cargo feature of `heph-core`, which exposes thin public wrappers — the feature +is never enabled in normal builds. + +Run all targets briefly (default 60s each), or one target for longer: + +```bash +mise run fuzz # all targets, 60s each +mise run fuzz 300 # all targets, 5 min each +cargo +nightly fuzz run crdt_merge --fuzz-dir crates/heph-core/fuzz -- -max_total_time=3600 +``` + +Crash artifacts land in `crates/heph-core/fuzz/artifacts//`; the corpus +accumulates in `crates/heph-core/fuzz/corpus//` (both gitignored). +Reproduce a crash with +`cargo +nightly fuzz run --fuzz-dir crates/heph-core/fuzz `. + +Tier 2 is deliberately not wired into CI: it needs nightly and meaningful wall +clock to earn its keep. Run it ad-hoc after touching `crdt.rs`, `extract.rs`, +or the sync payload path. If it ever moves to CI, a scheduled (not per-push) +workflow with a persistent corpus is the right shape. + +## Why these targets + +The high-value surfaces, ranked when this was set up: + +1. **`crdt::merge_body`** — decodes untrusted bytes from sync peers; a panic + here is a remote-input daemon crash. +2. **`extract`** — custom scanning logic layered over pulldown-cmark; promotion + rewrites body lines based on its output, so misalignment corrupts bodies. +3. **`wikilink` rewriting** — span arithmetic where off-by-ones hide. +4. **`datespec`/`quickadd`** — user-typed input parsed inside the daemon + process. + +Crashes found in dependencies (`yrs`, `rrule`, `pulldown-cmark`) are still +real `hephd` crashes — handle by validating/catching before the call, and +report upstream. + +## Related + +- [[v1-prototype-tech-spec]] — testing strategy +- [[design]] — sync/CRDT rationale diff --git a/docs/how-to/how-to.md b/docs/how-to/how-to.md index c20c904..45b62ab 100644 --- a/docs/how-to/how-to.md +++ b/docs/how-to/how-to.md @@ -23,3 +23,4 @@ Task-oriented guides for common operations. - [[self-update]] — Opt-in `hephd` self-update: poll the forge for new releases and auto-update - [[heph-pwa]] — The mobile app: an installable PWA mirror of heph-tui (browse, triage, fast quick-add, voice) - [[host-heph-pwa]] — Serve the mobile app from the hub (indri) with OIDC, in the hub/spoke deployment +- [[fuzz-testing]] — Property tests (proptest, in `cargo test`) and cargo-fuzz targets (`mise run fuzz`) for the parsing/CRDT surfaces