Matching Approach — Fellegi–Sunter and Our Deviations

The problem

Given a Mercury customer and a third-party bank link that claims to belong to that customer, decide whether the link is legitimate (Match) or warrants a fraud review (Mismatch). Three signals — personal/business name, email, phone — are available, each of which may be empty, noisy, or formatted inconsistently.

This is a record linkage problem: comparing records from two sources and deciding whether they refer to the same real-world entity. Our inputs are already joined by mercuryCompanyId so we skip the hardest part of the canonical problem (candidate pair generation / blocking) and focus on the agreement decision.

The Fellegi–Sunter framework

Fellegi & Sunter (1969), A Theory for Record Linkage, JASA 64(328), formalized the problem we’re solving. PDFs:

The framework in short:

  1. For each candidate pair, compute a comparison vector γ whose components encode field-level agreement (agree, disagree, partial, missing).
  2. Under the match hypothesis M, γ has probability distribution m(γ). Under the non-match hypothesis U, it has u(γ).
  3. Order the possible γ values by the likelihood ratio m(γ)/u(γ) and apply two thresholds: above the upper → Match (A₁), below the lower → Non-Match (A₃), in between → Possible Link (A₂, escalate to a human).
  4. The thresholds are chosen to hit target false-match and false-non-match error rates μ and λ.
  5. Corollary 1 assumes fields are conditionally independent given match status. Under that assumption, log(m(γ)/u(γ)) decomposes into a sum of per-field weights: log(m_k / u_k) when field k agrees, log((1 - m_k) / (1 - u_k)) when it disagrees.

That decomposition is where the familiar “field weights that sum to a match score” structure comes from.

Where our implementation is inspired by F&S

  • Per-field agreement scores, summed into a scalar decision statistic — our score_name + score_email + score_phone weighted sum is exactly the F&S Corollary-1 shape.
  • Threshold on the combined statistic — the binary version of the F&S L(μ, λ, Γ) rule.
  • Weights reflect discriminating powerNAME_WEIGHT = 2.5 being the largest encodes the judgment that full name agreement is the most specific signal, which is what a larger log(m/u) for rare-to-agree fields would formally express.
  • A comparison function that collapses messy records into agreement codestokenize_name, normalize_phone, normalize_email are the γ function. Our scoring-phone, scoring-email, and scoring-name reference cards describe each.

Where we deviate (knowingly)

  1. No probability model. Our weights are hand-picked constants. F&S weights are log-likelihood ratios estimated from data. Consequence: no false-match / false-non-match rate guarantees.
  2. No A₂ “possible link” tier. The prompt asks for binary, so we collapsed it. F&S would route link 2 (phone agreement only) to manual review rather than flat-rejecting it.
  3. Threshold chosen by fitting 9 labeled examples. F&S derives thresholds from target error rates. We overfit a tiny sample.
  4. Coarse within-field granularity. F&S’s γ_k can be rich categorical (“agree”, “partial”, “disagree”, “missing”). Ours is effectively {0.0, 0.5, 1.0}. The 0.5 “partial name” tier is a crude proxy for F&S’s partial-agreement codes.
  5. No frequency-based weights (F&S Corollary 2). Agreement on the surname "Smith" and agreement on "Windhorst" score identically for us. F&S’s value-specific weights would make the rare-surname case much more compelling — which would sharpen both link 8 ("Cyril" alone) and link 9 (spurious "media" coincidence).
  6. Conditional-independence assumption unstated. We implicitly assume field independence; F&S calls it out explicitly as a corollary assumption.
  7. Missing ≡ disagreement. Empty arrays produce score 0.0, conflating “no data” with “contradictory data.” F&S models missing as its own γ realization with its own m/u weights.
  8. No blocking. Skipped because our input is pre-joined by mercuryCompanyId.

Why we built it from scratch

Production record linkage libraries — particularly splink and recordlinkage — implement F&S properly: EM-estimated m/u probabilities, blocking, all of it. For the interview the exercise is architecture and reasoning rather than library fluency, so we rolled the decomposition ourselves. See interview/plan.md for a post-interview stretch goal of porting the engine to splink on a separate branch — the comparison of hand-tuned vs EM-estimated weights on the 9-link fixture would be genuinely interesting.