| Quicklinks: Full documentation | (Update for OTM v3) | Data |
| Table of contents |
Basic purpose to access the OTM data
OnTheMap (http://lehd.did.census.gov/led/datatools/onthemap.html) is a web-based, interactive mapping application released by the LEHD program at the US Census Bureau. The objective is to show where people work and where workers live on maps with companion reports on their age, earnings, and industry distributions. The underlying data (OTM data) are public-use data available for access and download on the Cornell VirtualRDC (http://www.vrdc.cornell.edu/news/?page_id=4), an internet-accessible computing environment dedicated to the exploration and development of synthetic data.
What can be downloaded and accessed
Since 2005, the U.S. Census Bureau has released multiple versions of the data underlying the OnTheMap (http://lehd.did.census.gov/led/datatools/onthemap.html) application. This site holds
- OnTheMap 2.0 data files
- OnTheMap 3.0 data files
- Years: 2002-2006
- Exceptions: 2002 data for AR and 2002 and 2003 data for MS do not exist
What users should know about the data
The place of residence counts are generated from a synthetic data model that conditions on disclosure-proofed place of work counts and other observable characteristics. Each of the implicate files available for OD and the RAC represents an independent draw from the synthetic data model. Detailed information on the full OTM data and the synthetic data model can be found in the data documentation (also see updates for OTM v3).
The U.S. Census Bureau wants to encourage use of the multiple implicates of the OTM data. LEHD Program research has found that three (3) implicates are usually sufficient to determine the extent to which the confidentiality protections affect the statistical results. Users who wish to explore the OTM data with additional implicates, please contact LEHD directly.
The base geography for version 3.0 of OnTheMap is TIGER 2006 Second Edition. An archival copy can be found here.
An important caveat applies to the analysis of synthetic data (and, with some minor differences, to imputed data elements): parameter estimates based on a single implicate are unlikely to equal the parameter estimates based on the underlying observed data. This is due to the fact that there is additional variation in the data introduced by the synthesizing process. However, it is possible to get more precise estimates by using multiple synthetic datasets, which is why more than one implicate can be downloaded. Leveraging multiple implicates is straightforward: the user's analysis is repeated independently on each of the implicates, and the resulting parameter estimates are combined using formulae described in Raghunathan, Reiter, Rubin (2003).
For further information on how to properly analyze multiply synthesized or imputed datasets, see- Raghunathan, Reiter, Rubin (2003), "Multiple Imputation for Statistical Disclosure Limitation," Journal of Official Statistics, 19:1, pgs. 1-16
- Reiter (2004), "New Approaches to Data Dissemination: A glimpse into the future (?)", Chance, 2004:17, pgs. 12-16
- Abowd and Lane (2003), "Synthetic data and confidentiality protection", Technical paper TP-2003-10, LEHD, U.S. Census Bureau
The process of getting an account and access
In order to access the OTM data, interested data users need to apply for a download account on the VirtualRDC, by contacting the VirtualRDC administrators (mailto:virtualrdc@cornell.edu). There is no project approval process. Optionally, users can also obtain an account on one of our compute servers, more information is provided in the online VirtualRDC guide (http://www.vrdc.cornell.edu/news/?p=13). Access is open to any user wishing to use the data for research purposes. We only ask that you provide comments, analysis, feedback and/or published papers. That information can be provided through the following list servers:
- led-qwi@lists.census.gov (QWI user community)
- lehd-ltd@lists.census.gov (Local Transportation Dynamics community under development)
- ctpp_news@chrispy.net (Census Transportation Planning Package community)
Technical requirements
Downloading data and analyzing it on own computer
In order to analyze the data on their own computers, users need to bring their own statistical software, and depending on the analysis, significant memory. Access is through a regular Web browser in the OnTheMap Download Area (http://www.vrdc.cornell.edu/onthemap/data/). The programs are available on the VirtualRDC OTM website (http://www.vrdc.cornell.edu/onthemap/).
Accessing data and analyzing it on the VirtualRDC
In order to login to the VirtualRDC compute nodes, users will need
- an internet-connected computer
- SSH client software
- (for graphical interface, optional) VNC or NX client software
All required software is open-source and free of charge, see the online VirtualRDC guide (http://www.vrdc.cornell.edu/news/?p=13) for download links and detailed instructions. Windows and Linux clients are supported, but all software is known to work on Mac OS X as well. Statistical software is provided free-of-charge for use on the VirtualRDC compute nodes for research purposes, see the Installed software page (http://www.vrdc.cornell.edu/news/?page_id=83) for a partial list. Once logged on to the VirtualRDC compute nodes, you will find the data and some useful SAS and Stata programs under
/mixed/onthemap/The programs are also available on the VirtualRDC OTM website (http://www.vrdc.cornell.edu/onthemap/).
Where to get help
For further information and assistance, contact the VirtualRDC administrators (mailto:virtualrdc@cornell.edu).
Funding and disclaimers
The VirtualRDC is not affiliated with the US Census Bureau. All data made available at this facility are public-use data. The VirtualRDC is partially funded by NSF Grants #0427889 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0427889), #0339191 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0339191) and #9978093 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9978093) and donations by Novell (http://www.novell.com/linux/) and Intel (http://www.intel.com).