- Home
- Data Catalog
- Decennial Surname Identifiers
Decennial Surname Identifiers
Add to My BasketDescription
These data contain disambiguated, person-level surname identifiers. These identifiers group together the same, or similar, surnames. We create a unique identifier for each surname and attach it to individual level records in the 2000 and 2010 decennial data. Several versions of the surname identifier are available: one that groups together only exactly identical surnames, one that groups together surnames based upon SAS DQ95 match codes, and one that groups together surnames based upon SAS DQ90 match codes. SAS DQ match codes blend soundex and other string normalization algorithms to create hash codes that group together similar bits of text. The SAS DQ95 will be ``more strict'' than a SAS DQ90. The surname identifiers are longitudinally consistent between the 2000 and 2010 decennial data, meaning a single surname identifier will be used to identify the same surname in the 2000 and 2010 surname ID files. With these data, users will be able to identify whether any two individual records in the 2000 and/or 2010 decennial census data share the same surname.
Note: Users accessing these surname identifiers must not have surname PII in their project space.
Scope and Coverage
- Surnames in the 2000 and 2010 Decennial Census
- National