Quicklinks: Full documentation (Update for OTM v3) Data
Table of contents

Basic purpose to access the OTM data

OnTheMap (http://lehd.did.census.gov/led/datatools/onthemap.html) is a web-based, interactive mapping application released by the LEHD program at the US Census Bureau. The objective is to show where people work and where workers live on maps with companion reports on their age, earnings, and industry distributions. The underlying data (OTM data) are public-use data available for access and download on the Cornell VirtualRDC (http://www.vrdc.cornell.edu/news/?page_id=4), an internet-accessible computing environment dedicated to the exploration and development of synthetic data.

What can be downloaded and accessed

Since 2005, the U.S. Census Bureau has released multiple versions of the data underlying the OnTheMap (http://lehd.did.census.gov/led/datatools/onthemap.html) application. This site holds

  • OnTheMap 2.0 data files
  • OnTheMap 3.0 data files
Version 1 files are now no longer accessible on the VirtualRDC.

OTM data can be downloaded from the OnTheMap download area on the VirtualRDC, with a previously obtained login and password (see below). Data are stored in compressed CSV format. For your convenience, read-in programs for SAS, Stata, and MySQL can be found on the web at http://www.vrdc.cornell.edu/onthemap/. Note that there is no guarantee that these programs read in the data correctly, although we have used them ourselves in the past. The data package consists of Origin-Destination data (OD), Residence Area Characteristics data (RAC), Workplace Area Characteristics data (WAC), and block-group level Quarterly Workforce Indicator data (QWI). Data files exist for states as listed on the LEHD website, for years 2002-2006.
  • Years: 2002-2006
  • Exceptions: 2002 data for AR and 2002 and 2003 data for MS do not exist

What users should know about the data

The place of residence counts are generated from a synthetic data model that conditions on disclosure-proofed place of work counts and other observable characteristics. Each of the implicate files available for OD and the RAC represents an independent draw from the synthetic data model. Detailed information on the full OTM data and the synthetic data model can be found in the data documentation (also see updates for OTM v3).

The U.S. Census Bureau wants to encourage use of the multiple implicates of the OTM data. LEHD Program research has found that three (3) implicates are usually sufficient to determine the extent to which the confidentiality protections affect the statistical results. Users who wish to explore the OTM data with additional implicates, please contact LEHD directly.

The base geography for version 3.0 of OnTheMap is TIGER 2006 Second Edition. An archival copy can be found here.

An important caveat applies to the analysis of synthetic data (and, with some minor differences, to imputed data elements): parameter estimates based on a single implicate are unlikely to equal the parameter estimates based on the underlying observed data. This is due to the fact that there is additional variation in the data introduced by the synthesizing process. However, it is possible to get more precise estimates by using multiple synthetic datasets, which is why more than one implicate can be downloaded. Leveraging multiple implicates is straightforward: the user's analysis is repeated independently on each of the implicates, and the resulting parameter estimates are combined using formulae described in Raghunathan, Reiter, Rubin (2003).

For further information on how to properly analyze multiply synthesized or imputed datasets, see or consult Sessions 8a and 8b of the online INFO~747 class at Cornell University's CISER at http://www.vrdc.cornell.edu/info747/2005/course_outline.html. For a more complete bibliography, consult the OTM Public data documentation's bibliography.


The process of getting an account and access

In order to access the OTM data, interested data users need to apply for a download account on the VirtualRDC, by contacting the VirtualRDC administrators (mailto:virtualrdc@cornell.edu). There is no project approval process. Optionally, users can also obtain an account on one of our compute servers, more information is provided in the online VirtualRDC guide (http://www.vrdc.cornell.edu/news/?p=13). Access is open to any user wishing to use the data for research purposes. We only ask that you provide comments, analysis, feedback and/or published papers. That information can be provided through the following list servers:

The research and evaluation results will be used to enhance understanding and developmental efforts for future versions of OnTheMap and synthetic data in general.

Technical requirements

Downloading data and analyzing it on own computer

In order to analyze the data on their own computers, users need to bring their own statistical software, and depending on the analysis, significant memory. Access is through a regular Web browser in the OnTheMap Download Area (http://www.vrdc.cornell.edu/onthemap/data/). The programs are available on the VirtualRDC OTM website (http://www.vrdc.cornell.edu/onthemap/).

Accessing data and analyzing it on the VirtualRDC

In order to login to the VirtualRDC compute nodes, users will need

  • an internet-connected computer
  • SSH client software
  • (for graphical interface, optional) VNC or NX client software

All required software is open-source and free of charge, see the online VirtualRDC guide (http://www.vrdc.cornell.edu/news/?p=13) for download links and detailed instructions. Windows and Linux clients are supported, but all software is known to work on Mac OS X as well. Statistical software is provided free-of-charge for use on the VirtualRDC compute nodes for research purposes, see the Installed software page (http://www.vrdc.cornell.edu/news/?page_id=83) for a partial list. Once logged on to the VirtualRDC compute nodes, you will find the data and some useful SAS and Stata programs under

  /mixed/onthemap/
The programs are also available on the VirtualRDC OTM website (http://www.vrdc.cornell.edu/onthemap/).

Where to get help

For further information and assistance, contact the VirtualRDC administrators (mailto:virtualrdc@cornell.edu).

Funding and disclaimers

The VirtualRDC is not affiliated with the US Census Bureau. All data made available at this facility are public-use data. The VirtualRDC is partially funded by NSF Grants #0427889 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0427889), #0339191 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0339191) and #9978093 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9978093) and donations by Novell (http://www.novell.com/linux/) and Intel (http://www.intel.com).