The LTDB provides estimates using 2010 boundaries for a standard set of variables from 1970 through 2000. These data may meet the needs of many users. As an initial step, we provide two sets of files. The "full count" files include the key variables from Census 2010 (race and Hispanic origin, age by race and Hispanic origin, household composition, and housing occupancy and tenure), with comparable variables from the censuses in 1970-2000 and adjusted to 2010 boundaries. The "sample count" files include a wide range of other variables based on sample data -- the one-in-six samples from the decennial census in 1970-2000 and the sample data from the American Community Survey for 2006-2010. These include variables such as socioeconomic characteristics that are widely used by researchers.
On the LTDB download page you will find a codebook that identifies the original tables in each census year from which data have been taken. You will also find the standard harmonized data set for 1970-2010. For reference, we provide the five individual data files that include the original tract IDs and data for each decade. CLICK HERE for the download page.
More versatile is a set of tools that allows users to input their own data. Key to this system are crosswalks for each prior year, serving the same purpose as the proprietary Block Weighting File developed by NCDB for 1990-2000. For every decennial year from 1970 to 2000, a crosswalk file is provided in which every row lists a 2010 tract ID, the ID of a tract in the source year that contributes to it, and the share of the source tract’s population attributes that should be allocated to the 2010 tract. In cases where there is an exact correspondence between the source tract and the 2010 tract, there is only one row of data for the 2010 tract. Otherwise there are as many rows as there are contributing tracts. (For completeness, the crosswalk file includes every contributing tract, regardless of how small a fraction of its population should be allocated to the 2010 tract. Therefore many 2010 tracts that we categorize as “no change” also list 2000 tracts that contribute to it, typically with very small weights.)
The crosswalk files include the tract ID in 2010 (which incorporates a state, county, and tract FIPS code), the 2010 metropolitan area (formally the Core Based Statistical Area or CBSA) code, and a flag to identify location in a central city in 2010 (referred to by the Census Bureau as a “principal city” in an MSA or metropolitan division. For the 2000-2010 crosswalk we provide one additional indicator that we believe will assist users of the interpolated data: whether there was a boundary change involving this tract and if so what type of change occurred between 2000 and 2010 (1 = no change; 2 = consolidation; 3 = split; 4 = many-to-many).
CLICK HERE for the download page.
The LTDB also offers code in Access and STATA that can be used, for any given year, in conjunction with the crosswalk file and an input data file prepared by the researcher. This code facilitates bridging any data organized within tract boundaries as early as 1970 to 2010 boundaries.
The format of the tract ID variable in the input data file is critical. It needs to be an 11 character string variable in the form of "SSCCCTTTTTT", where SS = FIPS state, CCC = FIPS county, and TTTTTT = census tract.
The Access and STATA programs allow researchers to input two types of variables. One type is “count” variables. These are variables, like the population of a tract or the number of disabled persons in a tract, that should be handled by summing the counts in origin tracts (with appropriate weights provided by the crosswalk file) to counts in the 2010 tract. For such variables, the user simply needs to add/select input variable names from their data file.
There are other kinds of variables that cannot simply be summed across source tracts. The typical case is a median or average or rate. Consider the example of median household income. Here the best estimate is a weighted average, where one calculates the average value (not the sum) of the median incomes of source tracts but weights them according to the number of households being contributed by each source tract. The user must identify the variable name to be handled this way along with a corresponding variable to be used as the weight. Typically in cases of medians or rates, this weighting variable will be the number of cases that the median represents or the denominator that was used in calculating the rate.
In STATA these variable names need to be added into the initial lines of code. Both types of variables can be handled in a single run.
In Access the count variables and medians/rates with associated weighting variables are identified through queries. The program is set up to do only one type of variable at a time. Users with both types of variables simply need to run Access twice.
Users not familiar with Access should be aware of some details in the procedure. Microsoft may generate a security warning about a file (on a line above the Access windows), and the user needs to click on that warning (“Options”) and enable use of the file. Using the “External Data” menu, Access can import a text file, Excel file, or DBF file to be used as the input data file. The input data file will then be added to the list of tables in the Tables window. This file should not have spaces in its name. If it does, the name can be changed after importing (right-click on the file name in the Tables window). At that point the queries in the main window can be responded to: select a census year for the interpolation, select the input data table name and identify the key field (the tract ID variable), and select any number of variables to be interpolated. (Or, alternatively, select variables for which a weighted average should be estimated and their associated weighting variables.) Provide a name for the output file (default is “interpolate_table), click on “Interpolate” and the output file will be created.
A special kind of variable that LTDB allows users harmonize is a dummy variable that takes on values of 0 and 1. Consider the example of a dummy variable for whether there is a toxic waste facility in the tract. If there is such a facility in every source tract, it would be desirable for the 2010 value to be 1, and if there is none in any source tract the 2010 value should be 0. If there is a toxic waste facility in some source tracts but not others, the value should intermediate. Certainly this is not like summing a count across tracts. We adjust dummy variables to 2010 boundaries in a way similar to count variables. They will be allocated according to the same weighting procedures, but the output variable must be interpreted differently. If all source tracts are coded zero on the dummy variable, the value for the 2010 tract will be zero. If any of the source tracts is coded one, the value for the 2010 tract will be equal to the sum of the proportions of the source tracts that contribute to it. Users might often recode this output variable simply to zero and non-zero.
The output file from Access (a comma-delimited .txt file) or STATA (a STATA data file) lists all of the 2010 information about the tract from the crosswalk file and values of the input variables converted to 2010 boundaries.
CLICK HERE for the download page.