Matching Approach — Fellegi–Sunter and Our Deviations
The problem
Given a Mercury customer and a third-party bank link that claims to belong to that customer, decide whether the link is legitimate (Match) or warrants a fraud review (Mismatch). Three signals — personal/business name, email, phone — are available, each of which may be empty, noisy, or formatted inconsistently.
This is a record linkage problem: comparing records from two sources and deciding whether they refer to the same real-world entity. Our inputs are already joined by mercuryCompanyId so we skip the hardest part of the canonical problem (candidate pair generation / blocking) and focus on the agreement decision.
The Fellegi–Sunter framework
Fellegi & Sunter (1969), A Theory for Record Linkage, JASA 64(328), formalized the problem we’re solving. PDFs:
The framework in short:
- For each candidate pair, compute a comparison vector γ whose components encode field-level agreement (agree, disagree, partial, missing).
- Under the match hypothesis M, γ has probability distribution
m(γ). Under the non-match hypothesis U, it hasu(γ). - Order the possible γ values by the likelihood ratio
m(γ)/u(γ)and apply two thresholds: above the upper → Match (A₁), below the lower → Non-Match (A₃), in between → Possible Link (A₂, escalate to a human). - The thresholds are chosen to hit target false-match and false-non-match error rates μ and λ.
- Corollary 1 assumes fields are conditionally independent given match status. Under that assumption,
log(m(γ)/u(γ))decomposes into a sum of per-field weights:log(m_k / u_k)when field k agrees,log((1 - m_k) / (1 - u_k))when it disagrees.
That decomposition is where the familiar “field weights that sum to a match score” structure comes from.
Where our implementation is inspired by F&S
- Per-field agreement scores, summed into a scalar decision statistic — our
score_name + score_email + score_phoneweighted sum is exactly the F&S Corollary-1 shape. - Threshold on the combined statistic — the binary version of the F&S L(μ, λ, Γ) rule.
- Weights reflect discriminating power —
NAME_WEIGHT = 2.5being the largest encodes the judgment that full name agreement is the most specific signal, which is what a largerlog(m/u)for rare-to-agree fields would formally express. - A comparison function that collapses messy records into agreement codes —
tokenize_name,normalize_phone,normalize_emailare the γ function. Our scoring-phone, scoring-email, and scoring-name reference cards describe each.
Where we deviate (knowingly)
- No probability model. Our weights are hand-picked constants. F&S weights are log-likelihood ratios estimated from data. Consequence: no false-match / false-non-match rate guarantees.
- No A₂ “possible link” tier. The prompt asks for binary, so we collapsed it. F&S would route link 2 (phone agreement only) to manual review rather than flat-rejecting it.
- Threshold chosen by fitting 9 labeled examples. F&S derives thresholds from target error rates. We overfit a tiny sample.
- Coarse within-field granularity. F&S’s γ_k can be rich categorical (“agree”, “partial”, “disagree”, “missing”). Ours is effectively {0.0, 0.5, 1.0}. The 0.5 “partial name” tier is a crude proxy for F&S’s partial-agreement codes.
- No frequency-based weights (F&S Corollary 2). Agreement on the surname
"Smith"and agreement on"Windhorst"score identically for us. F&S’s value-specific weights would make the rare-surname case much more compelling — which would sharpen both link 8 ("Cyril"alone) and link 9 (spurious"media"coincidence). - Conditional-independence assumption unstated. We implicitly assume field independence; F&S calls it out explicitly as a corollary assumption.
- Missing ≡ disagreement. Empty arrays produce score 0.0, conflating “no data” with “contradictory data.” F&S models missing as its own γ realization with its own m/u weights.
- No blocking. Skipped because our input is pre-joined by
mercuryCompanyId.
Why we built it from scratch
Production record linkage libraries — particularly splink and recordlinkage — implement F&S properly: EM-estimated m/u probabilities, blocking, all of it. For the interview the exercise is architecture and reasoning rather than library fluency, so we rolled the decomposition ourselves. See interview/plan.md for a post-interview stretch goal of porting the engine to splink on a separate branch — the comparison of hand-tuned vs EM-estimated weights on the 9-link fixture would be genuinely interesting.