- Date:
- Spring 2007, Thursdays 4:00 - 6:30pm, with 4 international participating sites.
- Instructor:
- Prof.
John M. Abowd (john.abowd@cornell.edu)
- Sponsor:
-
This course
is sponsored by the National Science Foundation Information Technologies
Research Program under grant SES #0427889 .
The course is designed to teach students all the basics required to acquire and
transform raw information into social and economic data. Legal, statistical,
computing, and social science aspects of the data "production" process
will be treated. Major emphasis will be placed on U.S. Census data that
are accessible from the Census Bureau's Research Data Center network.
This version of the course has been specially prepared for graduate students
who are planning to use RDC-based data or are seriously considering it.
RDC-based data products covered include the new Longitudinal Employer-Household
Dynamics (LEHD) micro data; the Longitudinal Business Database (LBD) and
its predecessor the Longitudinal Research Database (LRD); internal versions
of the Survey of Income and Program Participation (SIPP), Current Population
Survey (CPS), American Community Survey (ACS), American Housing Survey
(AHS), and the 1990 and 2000 Decennial Census of Population and Housing;
the Employer Business Register (BR and SSEL); the Censuses and Annual
Surveys of Manufactures, Mining, Services, Retail Trade, Wholesale Trade,
Construction, Transportation, Communications, and Utilities; Business
Expenditures Survey; Characteristics of Business Owners; and others. Students
will be introduced to the NSF-sponsored Virtual Research Data Center.
Core topics include:
- Basic statistical principles of populations and sampling frames
- Acquiring data via samples, censuses, administrative records, and transaction logging
- Law, economics and statistics of data privacy and confidentiality protection
- Data linking and integration techniques (probabilistic record linking; multivariate statistical matching)
- Data imputation techniques
- Analytic methods for complex linked data sets