Information for Researchers >>

Census geography: Pooling adjacent tracts to improve reliability of estimates

Beginning in December 2008, three-year estimates for geographic units with populations of 20,000 or more have been released annually. At the end of 2010, the ACS began to release five-year estimates for all geographic units. Pooling five years of data was initially expected to yield tract estimates whose reliability is comparable to the sample count estimates from 2000. It now seems likely that tract estimates will not achieve this level of reliability, and researchers who wish to use tract data will be interested in examining ways to pool data over multiple tracts.

Another complication is that tract estimates released at the end of 2010 will best reflect the tracts’ population composition in 2007, the midpoint of 2005-2009. However, short-form data from Census 2010 provide new information that may require adjustments in those estimates. The Bureau now plans to base the 2006-2010 five-year estimates on population counts from Census 2010, which will remove one source of error in these data. Subsequently, five-year rolling averages will be used to report annual trends in tract composition, so that there will be new ACS estimates of tract-level ACS survey variables in every successive year.

  • Andrew Beveridge (Queens College, CUNY) has expressed concerns with the methods used by the Census Bureau to estimate the Margin of Error (MOE). His memorandum and the response prepared by the Census Bureau clarify some of the issues that are involved. These are posted here: http://www.scribd.com/doc/61741043/Memo-Regarding-ACS-With-Response.
  • David Wong (George Mason University) has developed some tools under contract with the Census Bureau to take account of the MOE problem in spatial analysis and mapping. These tools and further explanation are available here: http://gesg.gmu.edu/

One approach to the MOE problem is to pool together data from adjacent tracts. For example, in one US2010 report, John Logan identified the “neighborhoods” that people lived in to include their census tract and each of the surrounding tracts (“Separate and Unequal” ). These neighborhoods, of course, are overlapping. US2010 has developed data files that identify the adjacent tract for every census tract in the nation in 1990, 2000, and 2005-2009. (Although the 2005-2009 ACS uses 2000 tract IDs in most cases, there are cases where the 2010 ID was used in error.) Researchers can use these files to aggregate census data for larger neighborhood areas. The adjacency files can also be used to create “spatial lag” variables – variables that describe the context in which a tract is embedded. Such variables are increasingly used in studies of neighborhood effects.

Download the files necessary to identify, for every census tract, its adjacent tracts. This is equivalent to a weights matrix (queen’s) in GIS terms.

For each year we provide two files, which together give users the ability to determine all of the tracts that are adjacent to a given tract. This capability would be especially useful for spatial analyses that require a weights matrix, such as spatial regression.

The file named tract_nnnn (where nnnn is the year) lists the key geographic identifiers for tracts: these include a state, county and tract number, a tractid that concatenates these into a single number, and a tract id (GISJOIN) that treats this number as a string variable. It also gives every tract a simple sequential id number that can be sued to identify it and its adjacent tracts. In the file named nlist_nnnn every tract is listed in one column with one identifier, and each of its adjacent tracts (and itself) are listed as separate cases with another identifier.

You will download a zip file that includes two files. The file named tract_nn (where nnnn is the year) lists the key geographic identifiers for tracts: these include a state, county and tract number, a tractid that concatenates these into a single number, and a tract id (GISJOIN) that treats this number as a string variable. It also gives every focal tract a simple sequential id number (FID) that can be used to identify it and its adjacent tracts. In the file named nlist_nn every focal tract is listed in one column with one identifier (FID), and each of its neighboring tracts (and itself) is listed as a separate case with another identifier (NID).

Select the year(s):  1990
 2000
 2005-09 ACS
Select the format:  SPSS/PC+ Version 19
 Comma Delimited
 SAS V9+ Windows
 STATA Version 8 SE


 

© Spatial Strucures in the Social Sciences, Brown University