Scoring — Name
mercury.score.score_name(link_names, customer_names, *, nicknames=None) -> float
Tokenization (mercury.normalize.tokenize_name)
- Lowercase.
- Remove apostrophes in place so
"Ram's"and"Rams"produce the same token. - Split on any non-alphanumeric run.
- Drop:
- single-character tokens (middle initials like
"b"in"John B. Smith"); - personal titles:
mr mrs ms miss mx dr sir madam prof; - corporate suffixes:
inc incorporated llc llp ltd limited corp corporation co gmbh; - generational suffixes:
jr sr ii iii iv.
- single-character tokens (middle initials like
- If a
nicknamesmapping is supplied, each surviving token is replaced by its canonical form (see scoring-nicknames).
The corporate-suffix stoplist is deliberately narrow. Industry words like "technologies" and "media" stay as real tokens so that, e.g., "InfoLinks Technologies" ↔ "InfoLinks Technologies, Inc." compares favourably.
Agreement rule
For every (link name × customer name) pair, intersect the token sets and apply:
| Overlap | Score |
|---|---|
| 2 or more tokens | 1.0 |
| exactly 1 token | 0.5 |
| 0 tokens | 0.0 |
The returned score is the max across all pairs — any single pairing that hits 1.0 is sufficient; otherwise the best partial-match level is returned.
Candidate construction
mercury.match.evaluate assembles customer_names as the union of:
f"{user.firstName} {user.lastName}"for every user on the record;- the company’s
tradeName; - the company’s
legalName.
This is what makes link 1 match: "InfoLinks Technologies" compares against "InfoLinks Technologies, Inc." (the legal name, corporate suffix stripped) and lands at 1.0 despite the personal name mismatching.
Partial scoring rationale
The 0.5 tier is intentionally generous. A single shared surname (“Windhorst”) is meaningful evidence that two records may refer to the same person — enough to flag for review, not enough to decide alone. The combiner’s threshold filters it: a lone 0.5 contributes 0.5 × 2.5 = 1.25, well below the 2.5 gate.
Spurious overlaps — for example, the shared word "media" between "IN MEDIA RES PUBLISHING" (link 9’s unrelated name) and "LASSAD MEDIA INC" (legal name) — also land at 0.5 and are correctly filtered by the combiner rather than by the scorer.
Combiner contribution
Weight: NAME_WEIGHT = 2.5 (see scoring-combiner). A full name match (1.0) alone clears the threshold — full first+last agreement is two-token bio evidence, treated as decisive. A partial name match (0.5) requires corroboration from phone or email.