Why Multiple Solution Variants
The interview prompt asks a specific question: is this link a Match or a Mismatch? We answer it with mercury match. But the prompt also invites improvement — “propose, or start implementing, a better system for determining a match” — and the scoring engine naturally produces richer output than a single bit.
Rather than pick one “best” answer and discard the others, the repo carries multiple solution variants as siblings. Each variant is a different reading of the same underlying decision problem. The catalog lives at solutions.
Why this shape?
Three concrete benefits.
1. The scoring engine is separate from the decision rule
mercury.match.evaluate returns a LinkResult — per-field scores plus a continuous total. That data structure doesn’t commit to a verdict shape. The threshold, the tiering, the output format — those are all variant-specific concerns that live in the CLI layer.
This is deliberate. Fellegi–Sunter (matching-approach) is explicit that the decision statistic and the decision rule are separate things: you compute m(γ)/u(γ) once, and then you can cut it with whatever error-rate policy you need. Holding that separation in the code makes it cheap to layer a new policy on top.
2. Each variant is a hypothesis about what “correct” means
The binary solution-match answers is this a match? The three-tier solution-triage answers how confident are we? A future variant could answer what would happen at a 0.1% false-match rate? or what’s the expected investigator workload?
None of those questions have a single correct answer — they’re products of different operational contexts. Making the variants first-class documents the fact that we know it.
3. It makes disagreements with the ground truth legible
When solution-match says “Match” but solution-triage says “Review,” that’s informative: the binary verdict is a threshold-side call, not a confident decision. The triage card makes the case (solution-triage) that on the 9-link fixture, our three threshold-straddling verdicts are exactly the three links where the fraud team itself hedged. That correspondence would be invisible in a single-output solution.
When to add a variant vs. tune an existing one
- Tune an existing variant if the change is just moving a cutoff or renaming a field — it’s still answering the same question, just with different parameters.
- Add a new variant if the change is answering a different question. Confidence bands, ranked recommendations, operational-cost-sensitive routing, or a splink-powered probabilistic rebuild — those are all new questions, and each deserves its own reference card alongside the others.
The mechanics live in solutions under “Adding a new variant.”
What’s not a solution variant
Changes to the scoring engine itself — new field scorers, different tokenization, nickname coverage improvements — are not variants. They change the total value every variant consumes, and they belong in the scorer reference cards (scoring-phone, scoring-email, scoring-name, scoring-nicknames, scoring-combiner) and, where relevant, in test fixtures. A variant is strictly a different way of rendering total into a decision.