Skip to main content
SAP Metadata Portal
Data Catalog Agencies About Contact Help
Log in Register
  1. Home
  2. Data Catalog
  3. Decennial Surname Identifiers

Decennial Surname Identifiers

Add to My Basket

Description

These data contain disambiguated, person-level surname identifiers. These identifiers group together the same, or similar, surnames. We create a unique identifier for each surname and attach it to individual level records in the 2000 and 2010 decennial data. Several versions of the surname identifier are available: one that groups together only exactly identical surnames, one that groups together surnames based upon SAS DQ95 match codes, and one that groups together surnames based upon SAS DQ90 match codes. SAS DQ match codes blend soundex and other string normalization algorithms to create hash codes that group together similar bits of text. The SAS DQ95 will be ``more strict'' than a SAS DQ90. The surname identifiers are longitudinally consistent between the 2000 and 2010 decennial data, meaning a single surname identifier will be used to identify the same surname in the 2000 and 2010 surname ID files. With these data, users will be able to identify whether any two individual records in the 2000 and/or 2010 decennial census data share the same surname.

Note: Users accessing these surname identifiers must not have surname PII in their project space.

See More

Metadata

  • Identification and Summary
  • Scope and Coverage
  • Detailed Methodology
  • Data Access
  • Application-Related
  • Export Metadata

Identification and Summary

Title
Decennial Surname Identifiers
Description

These data contain disambiguated, person-level surname identifiers. These identifiers group together the same, or similar, surnames. We create a unique identifier for each surname and attach it to individual level records in the 2000 and 2010 decennial data. Several versions of the surname identifier are available: one that groups together only exactly identical surnames, one that groups together surnames based upon SAS DQ95 match codes, and one that groups together surnames based upon SAS DQ90 match codes. SAS DQ match codes blend soundex and other string normalization algorithms to create hash codes that group together similar bits of text. The SAS DQ95 will be ``more strict'' than a SAS DQ90. The surname identifiers are longitudinally consistent between the 2000 and 2010 decennial data, meaning a single surname identifier will be used to identify the same surname in the 2000 and 2010 surname ID files. With these data, users will be able to identify whether any two individual records in the 2000 and/or 2010 decennial census data share the same surname.

Note: Users accessing these surname identifiers must not have surname PII in their project space.

Source(s)
Census Bureau
Authorizer(s)
Census Bureau
NSF Logo
Data Catalog Agencies About Help Privacy Act and Public Burden
Looking for U.S. government information and services? Visit USA.gov
An official website managed by the National Science Foundation -   ncses.nsf.gov
About StatsPolicy FOIA Privacy Accessibility No FEAR Act Vulnerability Disclosure