Scoring — Nicknames

mercury.nicknames.load_nicknames(path: Path | None = None) -> dict[str, str]

File format

A flat CSV, one equivalence class per line:

cyrus,cy,cyril
cyrenius,swene,cy,serene,renius,cene
aaron,erin,ronnie,ron

Every token on a line is considered interchangeable with every other token on the same line. Lines and tokens are lowercased on load.

Transitive merging

Tokens that appear on multiple lines are transitively unified via union-find. In the example above, "cy" appears on both the cyrus and cyrenius lines, so a single merged equivalence class is produced:

{cyrus, cy, cyril, cyrenius, swene, serene, renius, cene}

Any member matches any other. This is deliberate — "Cy" is legitimately ambiguous between all of them and we don’t want to guess.

Canonical form

The loader returns a token → canonical mapping where the canonical representative of each class is the lexicographically smallest member. Tokens outside any class are absent from the dict.

Callers (typically scoring-name) canonicalize their tokens before set-intersection so that nickname variants collapse to a single key.

Caching

load_nicknames is functools.cache-decorated on its path argument. The default file (the interview fixture at interview/extra-questions/nicknames.txt) is loaded once and kept for the process lifetime.

Of the nine fixtures, only link 6 is nickname-dependent: the link names "Cy J. Windhorst" vs the user’s "Cyril Windhorst". Without nicknames, score_name returns 0.5 (last-name-only overlap). With nicknames, both cy and cyril canonicalize to cene, the overlap becomes two tokens, and score_name returns 1.0 — enough for the combiner to flip the verdict to Match.