- Basic statistical principles of populations and sampling frames (no survey background assumed)
- Acquiring data via samples, censuses, administrative records, transaction logging, and web scraping
- Law, economics and statistics of data privacy and confidentiality protection
- Data linking and integration techniques (probabilistic record linking; multivariate statistical matching)
- Data editing and imputation techniques
- Analytical methods for complex linked data sets, relational databases, and networks
- To understand the history and components of the U.S. federal statistical system, and how these functions are organized in some other countries--you should be able to find the data you want and know who controls access to them
- To recognize the source data for federal statistical products, and use these files properly even if they are only supported as restricted-access confidential data--once you have the source data you should know how to analyze them whether or not they were edited and released for public-use
- To understand the data acquisition, edit, imputation, weighting, confidentiality protections, publications, and underlying microdata for major household and business data products in the federal statistical system--in preparing and executing your analysis, you should be able to take responsibility for the data preparation needed to create accurate, useful analysis files
- To use both spatial, temporal, and network modeling methods, especially Bayesian hierarchical models, as research tools when working with the microdata and public-use files from major household and business data products--you should be able to recognize and model the statistical and econometric complexities that occur when data are aggregated over time and space and from multiple sources
- To produce replicable, properly curated research results based on confidential and public-use data files--you should know how to document the complete provenance of your analysis and the curation of essential elements for reproduction of your results from the original data files
Lars Vilhuber, Cornell University[more info]
Warren Brown, Cornell University
We draw on expert guest lecturers for a variety of topics. A complete updated list is available here.
Margo Anderson (University of Wisconsin – Milwaukee) presents on the history of the federal statistical system (flipped classroom). She will be present to discuss the lecture.
Readings and other information
- Anderson, Margo. The American Census: A Social History, Second Edition. Yale University Press, 2015.
- Anderson, Margo J., and Seltzer, William. “Federal Statistical Confidentiality and Business Data: Twentieth Century Challenges and Continuing Issues’.” Journal of Privacy and Confidentiality 1.1 (2009): 7-52, 55-58.
About the Guest Lecturer
Margo Anderson, University of Wisconsin – Milwaukee
This class coincides with FSRDC system’s annual conference. There will be no in-classroom activity at most sites on this day (please check with local coordinator). The content of this section will be discussed on Sept 21, 2017, so students should take the time to view the materials on edX during this week.
Health statistics, energy statistics, agricultural statistics, others. Registered-based statistics, organic data. Details to come.
- Health statistics (Lecture Notes: INFO7470-S7-Parker, Jennifer Parker (NCHS))
- Agricultural statistics (Lecture Notes: INFO7470-S7-DunnHueth, additional materials, INFO7470-S7-Migrant Farm Labor in the Census of Agriculture, Richard Dunn (University of Connecticut) and Brent Hueth (University of Wisconsin-Madison))
- BLS data in the FSRDC (Lecture Notes: Session 7 – Monaco – BLS Data in the RDC, Kristen Monaco and Nicole Nestoriak (BLS))
Flipped classrom about access to restricted access data. Students will be introduced to the research proposal mechanism of the Federal Statistical Research Data Center.
Discussion will focus on how to access various restricted access data sets. Guest presenters may be present live in the videoconference classroom.
Part 3 switches gears, and discusses the need for and the requirements of replicable science (in general, and in restricted-access environments). This part is a live lecture by Lars Vilhuber.
- Restricted Access Data: INFO7470-S8-Proposals, Kristen Monaco on BLS proposal review, Matthias Umkehrer on IAB access
- Replicable Science: INFO7470-S9-Replicable Science
The class is flipped classroom, with discussion by “guest lecturer” John Abowd.
Introduction to record linking
- What is record linking, what is it not, what is the theory?
- Record linking: applications and examples – How do you do it, what do you need, what are the possible complications?
- Examples of record linking
Total quality evaluation – errors from coverage, sampling, edit, and imputation.
- Formal models of edits and imputations
- Missing data overview
- Missing records – Frame or census – Survey
- Missing items
- Overview of different products
- Overview of methods
- Formal multiple imputation methods
- INFO7470 S10 -Statistical Tools Edit and Imputation
- INFO7470 S11 -Statistical Tools Edit and Imputation Examples
The lab (an edit and imputation exercise) will be posted on the INFO7470x edX site. You will need to create a program, and upload the program (language of your choice) to edX.
- Why must users of restricted-access data learn about confidentiality protection?
- What is statistical disclosure limitation?
- What are privacy-preserving data mining and differential privacy?
- Basic methods for disclosure avoidance (SDL)
- Rules and methods for model-based SDL
- SDL-based noise methods
- Synthetic data
- Differential privacy methods
- Part A: Spatial Analysis (Nicholas Nagle of University of Tennessee – Knoxville)
- Part B: Network Analysis (John Abowd, Cornell University)
Part A: Spatial Analysis
- Basic Geocoding
- Tools for Geocoding
- Analysis Methods
- Tools for Geographic Analysis
About the Guest Lecturer
Nicholas Nagle, University of Tennessee – Knoxville
Part B: Network Analysis
This part of the lecture is a live class.
About the Guest Lecturer
John Abowd, Cornell University and now U.S. Census Bureau
RequirementsAny student enrolled in a Ph.D. or Masters program at one of the participating universities may take this course. Students at Cornell register for INFO 7470 (or ECON7400/ ILRLE7400, identical). Some programming experience (in any statistical programming language) is required for some of the labs. Some statistical or econometric training is required for some of the lectures.
EnrollmentIn addition to local registration rules for each of the participating sites, all students will also register in a edx Edge class. The URL will be updated at a later time.
LecturesThe course has two types of lectures. The first half of the course (roughly Sessions 3-8) is in a "flipped classroom" setting, whereas the latter half is a more traditional lecture style. The type of lecture will be identified for each date on the course calendar.
- In-classroom discussions and additional materials and guest lectures will occur at the time and date listed on the course calendar.
- Lectures should be viewed on edx Edge prior to the classroom time
- Exercises and labs on edx Edge should be completed prior to the classroom time
- In-classroom time is expected to be shorter than the usual full length, presumably 1 hour, but dependent on classroom participation
- In-classroom lecture and discussion will occur at the time and date listed on the course calendar.
- The expected length corresponds to the listed time.
- Additional materials may be made available on edx Edge
- Exercises and labs will be uploaded and graded on edx Edge.