Solution — triage (three-tier)
uv run mercury triage <customers.json> <banks.json>Emits three tiers — Match, Review, Mismatch — by banding the same total score the binary variant uses. Recovers Fellegi–Sunter’s A₂ possible link decision: the answer is “escalate to a human” rather than forcing a Match/Mismatch.
Decision rule
tier = Match if total >= REVIEW_HIGH (3.5)
Review if REVIEW_LOW <= total < REVIEW_HIGH (1.5 <= total < 3.5)
Mismatch if total < REVIEW_LOW (1.5)
Constants live in mercury.match: REVIEW_LOW = 1.5, REVIEW_HIGH = 3.5.
Output format
Total matches: X
Total reviews: Z
Total mismatches: Y
Link 1: Match
Link 2: Review
...
Why this solution exists
The match solution forces binary verdicts out of a scoring engine that natively produces a continuous scalar — information is thrown away at the threshold. On the 9-link fixture, the three links whose fraud-team comments were most hedged are exactly the three whose total scores land near the binary threshold. The banding makes that already-present signal usable.
1. The fraud-team language encodes three tiers, not two
Ranked by how much the binary match solution is inferring vs. following an explicit verdict:
| Link | Fraud team verbatim | Inference load |
|---|---|---|
| 1 | ”Looks good!” | Low — explicit positive |
| 3 | ”good to go” | Low — explicit positive |
| 7 | ”Direct name match and phone match.” | Low — “direct” signals confidence |
| 4 | ”I don’t see any connection. Possible fraud?” | Low-medium — lean negative, hedged with ? |
| 9 | ”This doesn’t seem to match at all. Possible fraud?” | Low-medium — same pattern |
| 5 | ”The business name matches, user’s matches the business name, and the email matches.” | Medium — no verdict word; we inferred from the enumeration |
| 6 | ”Cy is short for Cyril, and the last names match, so that’s probably good.” | Medium-high — “probably” is explicitly non-committal |
| 8 | ”The phone numbers match, and the first name of Cyril matches. Probably good.” | Medium-high — same hedge |
| 2 | ”The phone number matches, but there’s no other connection. Going to call the customer to ask about this.” | High — explicitly deferred to a human |
“Going to call,” “probably good,” and “possible fraud?” are all A₂-shaped — the fraud team was not deciding.
2. Our scoring engine already surfaces that ambiguity
The same links whose language was hedged are the links whose total score lands nearest the binary threshold:
| Link | Fraud hedge | total | Binary (>= 2.5) |
|---|---|---|---|
| 2 | ”going to call” | 1.5 | Mismatch (forced) |
| 6 | ”probably good” | 2.5 | Match (exactly at gate) |
| 8 | ”probably good” | 2.75 | Match (just above) |
The other six links are all well away from the threshold — ≥ 4.0 for unambiguous matches, ≤ 1.25 for unambiguous mismatches. The ambiguity isn’t a bug; it’s information we were discarding.
3. Three-tier output on the 9-link fixture
With REVIEW_LOW = 1.5 and REVIEW_HIGH = 3.5:
| Link | total | Triage tier | Matches fraud-team language? |
|---|---|---|---|
| 1 | 5.5 | Match | ✓ “Looks good!“ |
| 2 | 1.5 | Review | ✓ “going to call the customer” |
| 3 | 4.0 | Match | ✓ “good to go” |
| 4 | 0.0 | Mismatch | ✓ “I don’t see any connection” |
| 5 | 4.0 | Match | ✓ all three signals agree |
| 6 | 2.5 | Review | ✓ “probably good” |
| 7 | 4.0 | Match | ✓ “Direct name match and phone match” |
| 8 | 2.75 | Review | ✓ “probably good” |
| 9 | 1.25 | Mismatch | ✓ “doesn’t seem to match at all” |
Totals: 4 Match, 3 Review, 2 Mismatch. Verified by tests/test_solution_triage.py::test_triage_full_output.
Relation to Fellegi–Sunter
This is F&S’s original framing (matching-approach): two cutoffs on a likelihood-ratio-ordered statistic producing A₁ (link), A₂ (possible link), A₃ (non-link). We chose the cutoffs to fit the fraud-team’s own hedging language on the 9-link fixture; F&S would derive them from target false-match and false-non-match error rates μ and λ. The structure is the same; our constants are heuristic.