Methodology

Every number on this site is reproducible from public data. Here is where each one comes from and how we compute it.

First names

First names come from the Social Security Administration's national baby name files, which record every given name on a birth certificate back to 1880 broken out by sex. We combine every year into a single table per name and keep the annual counts so we can draw popularity charts.

Last names

Surnames come from the US Census Bureau's decennial "Frequently Occurring Surnames" file. The file contains every surname borne by 100 or more Americans at the last census, with the count, per-100,000 frequency, and a self-reported ancestry breakdown across six categories.

Living bearer estimates

For first names we apply the CDC's most recent life tables to each birth year. Every baby from 1905 gets multiplied by the probability of still being alive at their current age; every baby from 2005 is essentially unchanged. We sum across all years to get an age-weighted estimate.

For last names we apply the per-100,000 frequency to the current US resident population to project today's number of bearers. This assumes the surname's share of the population has not shifted since the last census.

Full name combinations

Combining two independent frequencies produces an expected number of people with a given full name. We multiply the first-name living estimate's share of the population by the last-name's share and apply it back to the population. This assumes statistical independence between first and last names, which is a simplification but gives reasonable back of envelope numbers.

Rarity buckets

We bucket living-bearer estimates into five tiers: Very Common (500K+), Common (100K to 500K), Uncommon (10K to 100K), Rare (1K to 10K), and Very Rare (under 1K). The thresholds apply equally to first and last names.

Known limitations

The SSA files only include births issued a Social Security number. Early-20th-century records are less complete.
The Census surname file excludes names borne by fewer than 100 people.
Ancestry percentages are self-reported, which undercounts mixed and Hispanic identities in the oldest tables.
Life-table-based estimates assume average US mortality by sex and age. Subgroups with different mortality profiles will diverge.